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ABSTRACT 

This  report  describes  progress  in  the  development 
of  the  area  classification  portion  of  a computer 
vision  system  for  cloud  pattern  analysis.  The  ultimate 
goal  of  the  vision  system  is  to  extract  meteorologically 
significant  cloud  regions  from  a time  sequence  of  dual- 
channel geosynchronous  satellite  images.  The  question 
explored  by  this  paper  is  to  what  extent  single-stage 
and  multistage  statistical  pattern  recognition  techni- 
ques may  be  employed  in  the  classification  of  clouds 
from  a single  dual-channel  image. 
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1 . Introduction 

The  success  or  failure  of  global  numerical  weather  pre- 
diction models  hinges  upon  two  basic  factors:  adequate  for- 

mulation of  the  system  of  hydrodynamic  and  thermodynamic 
equations  modeling  the  dynamics  of  the  atmosphere,  and  accu- 
rate determination  of  the  set  of  initial  variables  input  to 
the  numerical  prediction  model.  A detailed  description  of 
numerical  weather  prediction  models  can  be  found  in  [1]. 

This  set  of  variables  includes  horizontal  wind  velocity 
estimates  for  each  pressure  height  (vertical  level)  of  the 
forecasting  model.  A global  set  of  wind  velocity  estimates 
can  be  obtained  only  by  supplementing  weather  station 
estimates  (available  primarily  for  the  Northern  Hemisphere) 
extracted  from  radiosonde  data,  aircraft  reports,  and  drop- 
sonde  data  with  automatically  extracted  wind  velocity 
estimates  obtained  from  observation  of  cloud  motion  in  con- 
secutive pairs  of  geostationary  satellite  images.  The  sub- 
ject of  this  report  is  the  design,  implementation,  and  test- 
ing of  an  automatic  cloud  classification  system  for  prepro- 
cessing geostationary  satellite  images  used  in  wind  velocity 
estimation.  Currently,  this  classification  is  done  by 
meteorologists  with  the  aid  of  time  sequence  movie  loops  in 
addition  to  the  pairs  of  images. 

Deterioration  in  the  quality  of  automatic  wind  velocity 
estimates  can  often  be  traced  either  to  incorrect  estimation 


of  the  variables  relating  infrared  measurements  to  cloud- 
top  temperatures  or  to  violation  of  the  basic  assumption  that 
the  observed  cloud  motion  corresponds  to  the  horizontal  wind 
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flow.  References  [2],  [3],  [4],  [5]  discuss  the  problem  in 
further  detail.  Identification  followed  by  rejection  of 
satellite  image  areas  containing  cumulonimbus  clouds  elimin- 
ates from  consideration  for  wind  velocity  estimation  a major 
non-advecti ve  cloud  type  whose  motion  is  affected  by  strong 
vertical  currents.  Variables  which  affect  both  classifica- 
tion accuracy  and  accuracy  of  conversion  of  infrared  measure- 
ments into  cloud-top  temperatures  are  size  of  cloud  elements, 
number  of  breaks  or  holes  in  the  cloud  elements,  location  of 
cloud  elements  relative  to  viewing  angle  of  the  sensor,  pre- 
sence of  atmospheric  gases  such  as  water  vapor,  carbon 
dioxide,  and  ozone  along  the  radiation  path,  and  opaqueness 
of  the  cloud  elements.  If  the  cloud  elements  do  not  entirely 
fill  the  field  of  view  of  the  sensor  or  if  there  are  breaks 
or  holes  in  the  clouds  smaller  than  the  sensor's  resolution, 
the  satellite-derived  temperature  will  be  warmer  than  the 
actual  temperature.  Identical  cloud  patterns  viewed  at  diff- 
erent angles  may  appear  to  have  different  temperature  pro- 
files. The  sensor  when  viewing  the  cloud  patterns  at  a 
direct  angle  may  measure  a larger  proportion  of  surface 
radiation  in  the  image  window  than  when  viewing  the  cloud 
pattern  at  an  oblique  angle,  thus  making  it  appear  that  the 
cloud  pattern  viewed  directly  is  warmer  than  the  cloud 
pattern  viewed  obliquely.  Radiation  from  sea-surface  and 
clouds  involves  absorption  and  re-emission  at  a lower  temper- 
ature by  several  atmospheric  gases  along  the  path  from  sea- 
surface  or  clouds  to  satellite  sensor.  The  longer  the 
radiation  path  or  the  higher  the  concentration  of  water  vapor 


(water  vapor  concentration  in  the  tropics  is  particularly 
high)  and  other  gases,  the  colder  the  satellite-derived  tem- 
perature will  appear  to  be.  If  the  absorptivity  of  these 
gases  is  negligible,  and  if  the  clouds  are  continuous  and 
opaque  to  terrestrial  radiation  (i.e.,  emissivity,  E,  is 
assumed  to  be  unity),  then  the  satellite-derived  temperature 
closely  approximates  the  cloud-top  temperature. 

Cloud  emissivity  (E)  is  a variable  which  must  be 
estimated  when  relating  the  radiance  or  temperature  (By) 
measured  by  infrared  satellite  sensors  to  the  temperature 
(Bc)  of  a cloud.  The  vertical  location  of  a wind  vector 
estimated  from  the  movement  of  cloud  c is  determined  by  en- 
tering the  temperature  Bc  and  the  location  Pc  of  the  cloud 
into  the  National  Meteorological  Center  (NMC)  data  base 
(stored  on  an  IBM  360/195  system  of  vertical  temperature 
profiles).  The  pressure  altitude,  rounded  off  to  the  nearest 
10  mb,  becomes  the  vertical  location  of  the  wind  vector.  The 
temperature  Bc  of  the  cloud  is  related  to  the  infrared 
satellite  reading  By  for  cloud  c by  the  following  equation: 

Bc  = CBT  ' bs(i-e)]/e 

where 

By  = radiance  at  satellite 

Bc  = radiance  from  cloud 

B = radiance  from  surface  below  the  cloud  (either  sea- 
surface  and/or  underlying  clouds) 

E = emissivity  of  cloud  c 

When  the  value  of  the  emissivity  is  unity,  the  satellite- 
derived  temperature  equals  the  cloud-top  temperature.  The 


range  of  possible  values  for  E is  maximal  for  cirrus  clouds 
where  values  of  emissivity  can  range  from  .35  up  to  unity. 

The  choice  of  cloud  tracers  for  cirrus  clouds  should  be  re- 
stricted to  sufficiently  opaque  patches  which  appear  white  in 
both  visible  and  infrared  images.  The  problem  of  restriction 
of  cloud  tracers  is  not  as  acute  for  middle  clouds  (where 
emissivity  values  range  typically  between  .7  and  unity)  or 
for  low  clouds  (which  generally  have  high  values  of  emissivity) 
The  identification  of  areas  containing  predominantly 
cirrus  clouds  alerts  the  wind  extraction  program  to  check 
the  opacity  of  cloud  tracers  in  both  visible  and  infrared 
data.  Areas  containing  multi-layered  clouds  present  addition- 
al processing  problems  to  a wind  extraction  program  in  that 
movement  of  cloud  elements  within  the  window  cannot  be  in- 
tegrated into  one  wind  velocity  vector.  The  task  of  the 
classification  program  in  preprocessing  satellite  VISSR 
(Visible  and  Infrared  Spin  Scan  Radiometer)  data  is  summariz- 
ed in  Table  1 . 

The  four  classes  of  clouds  enumerated  in  Table  1 are: 
Class  1 --  cumulonimbus  clouds 

Class  2 --  low  clouds,  i.e.,  cumulus,  cumulus  congestus, 
stratocumul us  , and  stratus 
Class  3 --  cirrus  clouds 

Class  4 --  mixed  clouds  --  cirrus  with  low  clouds, 

cirrus  with  middle  clouds,  and  cirrus  with 
low  and  middle  clouds 

An  area  on  a satellite  image  was  classified  as  Class  1 if 
cumulonimbus  clouds  occurred  in  any  part  of  the  area,  re- 
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gardless  of  their  amount,  and  whether  or  not  cumulus  or 
cirriform  clouds  were  also  present.  An  area  was  classified 
as  Class  2 if  predominantly  single-layered  low  clouds  were 
present  and  as  Class  3 if  predominantly  single-layered  cirrus 
clouds  were  present.  An  area  was  classified  as  Class  4 if 
it  contained  mul ti pie- 1 ayered  cloud  elements. 

This  research  evaluates  the  ability  of  various  features 
and  statistical  classification  methods  to  separate  sample 
areas  selected  from  VISSR  images  into  the  above  four  classes. 
A sub-problem,  the  design  of  a pattern  classification  system 
to  separate  samples  from  classes  1,  2,  and  3,  was  also  inves- 
tigated. Section  2 describes  the  characteristics  of  the 
NOAA-1  satellite  data  set.  Section  3 describes  the  feature 
selection  phase  of  the  pattern  classification  study.  Section 
4 describes  the  classifier  selection  phase  of  the  pattern 
classification  study.  Section  5 presents  conclusions  and 
plans  for  further  research. 

A pattern  recognition  system  may  be  decomposed  into 
three  different  parts: 

1)  choice  of  decision  logic  structure,  i.e.,  whether 
all  classes  are  to  be  separated  at  once  or  whether 
one  or  more  classes  are  to  be  sequentially  separated 
from  the  remaining  classes 

2)  choice  of  classifier (s) , and 

3)  choice  of  features. 

Eight  different  decision  trees,  represent! ng  alternatives 
for sequenti a 1 cloud  pattern  identi f i cati on,  were  examined 
during  the  course  of  this  study.  Four  different  classifiers 


--  maximum  likelihood,  multiclass  one-agai nst-the-rest , 
multiclass  voting,  and  Fisher  using  sample  "a  priori"  proba- 
bilities --  were  applied  at  various  nodes  of  the  decision  tree 
structures.  A total  of  334  feature  statistics  --  46  first- 
order  statistics  and  288  second-order  or  texture  statistics 
--  were  extracted  from  243  sample  cloud  observations.  The 
sample  cloud  observations  were  divided  into  four  classes 
labelled  "low"  clouds,  "mix"  clouds,  "cirrus"  clouds,  and 
"cumulonimbus"  clouds.  For  each  feature  a measure  of  class 
separation,  the  Fisher  Distance,  was  calculated  for  each  of 
the  six  two-class  combinations.  This  measure  was  used  as  a 
guide  in  the  selection  of  those  combinations  of  features 
assigned  to  a given  decision  tree  node. 

Maximum  likelihood  two-level  classification  for  identi- 
fication of  the  four  classes  using  a selected  combination 
of  seven  features  at  the  first  level  and  six  features  at  the 
second  level  resulted  in  91.4%  of  the  samples  being  correctly 
classified.  If  areas  which  contained  "mix"  samples  could  be 
processed  by  scene  analysis  techniques,  the  au;we  four-class 
problem  would  reduce  to  the  three-class  problem  of  iaentifi- 
cation  of  "low",  "cirrus",  and  "cumulonimbus"  clouds. 

Maximum  likelihood  single-level  classification  for  identifi- 
cation of  these  three  classes  using  only  two  features  re- 
sulted in  98.7%  of  the  samples  being  correctly  classified. 
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Characteristics  of  the  Digital  Satellite  Data  and 
Class  Category  Map  for  Sample  Tropical  Cloud  Patterns 


2 . 1 Description  ofNOAA-1  Visible  and  Infrared  Ingesl 

The  digitized  satellite  data  for  the  sample  cloud 
patterns  analyzed  in  this  study  resulted  from  analog-to- 
digital  processing  of  scanning  radiometer  signals  received 
from  the  NOAA-1  spacecraft  on  May  3,  1971  (Orbit  1798). 

The  NOAA-1  polar  orbiting  satellite  was  in  operation  from 
December  11,  1970  until  August  19,  1971.  NOAA-1  was 
launched  into  an  approximately  790  n.mi.,  sun-synchronous 
orbit,  i.e.,  the  orbital  plane  precessed  about  the 
-Earth's  polar  axis  in  the  same  direction  and  at  the  same 
average  rate  as  the  Earth's  annual  revolution  about  the 
sun,  thereby  minimizing  annual  variation  in  satellite  sun 
angle.  The  ascending  node  (longitude  at  which  the 
satellite  crosses  the  Equator  from  south  to  north)  cross- 
ing was  at  1500  hours  local  mean  solar  time. 

The  prime  imagery  sensors  of  NOAA-1  are  two-channel 
scanning  radiometers  sensitive  to  energy  in  the  0.52  to 
0.72pm  visible  spectrum  and  to  energy  in  the  10.5  to 
12.5pm  atmospheric i nfrared  "window".  Energy  is  gathered 
by  a 5 inch  elliptical  scan  mirror  set  at  an  angle  of  45° 
to  the  scan  axis  and  rotating  at  48  rpm.  The  rotating 
mirror  provides  an  optical  scan  of  a 3,622  n.mi.  long 
area  of  the  earth  perpendicular  to  the  direction  of  space- 
craft motion.  The  infrared  window  radiation  is  collected 
into  a thermistor  bolometer,  the  size  of  which  (5.3 


ml  1 1 i rad i an s ) defines  the  Instantaneous  Field  of  View. 

The  visible  energy  is  detected  by  a silicon  photo  voltaic 
detector,  the  size  of  which  together  with  its  field  stop 
limits  the  Instantaneous  Field  of  View  for  visual  data  to 
approximately  2.8  mi  1 1 i radi ans . Since  the  mirror  rotates 
at  a constant  angular  rate,  the  geometic  resolution  of  the 
ground  field  of  view  decreases  as  the  distance  from  the 
subsatellite  point  increases.  Resolution  near  the  sub- 
point  is  approximately  4 n.mi.  for  the  infrared  channel, 
decreasing  to  8 n.mi.  by  12  n.mi.  where  the  zenith  angle 
(angle  between  normal  to  earth's  surface  and  satellite) 
is  beyond  60°;  and  2 n.mi.  for  the  visible  channel,  de- 
creasing to  4 n.mi.  by  8 n.mi.  for  a zenith  angle  beyond 
60°.  At  the  subpoint,  successive  infrared  channel  scan 
lines  are  contiguous  and  overlap  as  the  distance  from 
the  subpoint  increases.  There  is  a 2 n.mi.  gap  between 
visible  channel  data  lines  at  the  subpoint.  This  gap 
disappears  for  distances  more  than  750  n.mi.  from  the 
subpoint.  Descriptions  of  the  N0AA-1  spacecraft  and 
operational  products  available  from  ITOS  scanning  radio- 
meter data  can  be  found  in  [3],  [6],  [7],  and  [ 8 ]. 

The  process  of  converting  raw  scanner  signals  into 
raw  ingest  data  available  on  tape  is  described  in  [8]. 
Electrical  and  therma 1 or  brightness  calibrations  are 
then  applied  to  the  data  (see  Appendix  B,  [8]).  Further 
corrections,  such  as  corrections  of  infrared  data  for 
atmospheric  attenuation  ("limb  darkening"  correction)  as 
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a function  of  local  zenith  angle,  or  sun  normal ization 
corrections  of  the  visual  data  to  compensate  for  differ- 
ences in  solar  illumination,  were  not  made.  For  the  data 
selected  for  this  study,  the  effects  of  atmospheric 
attenuation  and  varying  sun-angle  were  considered  minimal 
and  thus  neglected.  From  each  scan  line,  only  the  central 
1120  points,  for  which  no  curvature  correction  and  no  limb 
darkening  correction  were  considered  necessary,  were 
chosen.  Also,  there  was  no  problem  of  sun  glint  for  this 
particular  area  of  data. 

The  region  of  data  selected  consists  of  a pair  of 
infrared  and  visual  data  arrays  of  1120  x 960  points, 
covering  an  area  of  approximately  25°  of  latitude  and  30° 
of  longitude  over  the  tropical  eastern  Pacific  Ocean  west 
and  South  of  Baja  California.  Latitude  limits  are  26.7° 

N to  1.1°  S.  Values  from  the  visible  spectrum  represent 
measurements  of  albedo  ranging  from  0 (black)  to  255 
(white).  Values  from  the  infrared  spectrum  represent 
effective  blackbody  radiative  temperature  measurements 
ranging  from  160.0  (white)to  330.0  (black)  degrees  Kelvin. 
The  infrared  values  were  rounded  to  the  nearest  degree 
and  re-scaled  by  a shift  of  -160.  In  order  to  obtain 
the  pictures  shown  in  Fics.  1 and  2,  each  scan  line  was 
doubled.  For  a region  from  the  given  pictures  consisting 
of  a 32x32  matrix  of  points,  the  areal  dimension  is  approxi- 
mately 54x96  n.mi.  The  vertical  dimension  of  96  n.mi.  was 
obtained  by  multiplying  the  average  resolution  of  a point 
(3  n.mi.)  by  32,  the  number  of  vertical  lines  in  a 32x32 


array.  The  average  resolution  was  taken  as  the  average 
of  a 2 n.mi.  visible  resolution  and  a 4 n.mi.  infrared 
resolution.  For  the  horizontal  dimension,  points  did  not 
represent  contiguous  fields  of  view.  The  sampling  rate 
of  the  digitizer  was  approximately  1.7  samples  per  field 
of  view,  resulting  in  approximately  18  contiguous  fields 
of  view  for  32  points.  Multiplying  18  (the  number  of  con- 
tiguous fields  of  view)  by  3 n.mi., one  obtains  an  approxi- 
mate horizontal  dimension  of  54  n.mi.  A description  of 
the  data  and  of  the  accompanying  cloud  category  map  prepared 
by  meteorologists  from  the  National  Environmental 
Satellite  Service  is  given  in  [9]. 

2 . 2 Description  of  the  Cloud-Truth  Analysis  for  NOAA-1 

Satellite  Data 

This  section  summarizes  the  description  given  in  [9] 
of  the  method  of  preparation  of  the  cloud  category  map, 
shown  in  Figure  3,  for  the  satellite  cloud  data  of  Figures 
1 and  2.  Two  meteorologists,  furnished  with  ungridded 
enlargements  of  the  infrared  and  visual  NOAA-1  satellite 
cloud  data,  gridded  enlargements  (with  the  1120x960  data 
arrays  gridded  into  35x30  observations  each  containing 
32x32  data  points),  and  near-time  coincident  movie  loops 
of  visual  imagery  from  the  ATS-1  satellite,  were  asked 
initially  to  identify  all  possible  cloud  types  observable 
in  the  given  region.  The  analysis  resulted  in  a categori- 
zation of  each  of  the  1050  sample  areas  of  pairs  of  32x32 
infrared  and  visual  matrices  into  one  of  the  following 


mmmmm 


eight  groups: 

(1)  no  observable  clouds 

(2)  cumulus/cumulus  congestus 

(3)  stratocumul us/stratus 

(4)  cumulonimbus 

(5)  cirrus 

(6)  cirrus  with  low  clouds 

(7)  cirrus  with  middle  clouds 

(8)  cirrus  with  low  and  middle  clouds 

Within  any  particular  group,  the  cloud  amount  for  the  pre- 
dominant cloud  type  was  variable;  for  example,  an  obser- 
vation with  a very  small  amount  of  cumulus  cloud  content 
was  classified  into  group  2 rather  than  group  1.  Also, 
whenever  cumulonimbus  clouds  occurred  in  combination  with 
any  other  cloud  type,  regardless  of  amount  of  cumulonimbus 
present,  the  observation  was  classified  as  cumulonimbus. 
The  observations  were  then  automatically  classified  as 
described  in  [9].  The  number  of  correct  classifications 
was  675  out  of  1050.  The  375  mi scl assi f i ed  observations 
were  re-examined  by  the  two  meteorologists,  and  184  of 
the  375  samples  were  re-classified  with  the  new  classifi- 
cations for  115  of  the  184  samples  in  agreement  with  the 
corresponding  automatic  classifications.  The  group  fre- 
quencies for  the  sample  observations  at  the  conclusion 
of  this  final  analysis  were 

(1)  group  1 - 113  samples 

(2)  group  2 - 258  samples 

(3)  group  3 - 155  samples 
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(4) 

group 

4 - 

117 

samples 

(5) 

group 

5 - 

174 

samples 

(6) 

group 

6 - 

182 

samples 

(7) 

group 

7 - 

38 

samples 

(8) 

group 

8 - 

13 

samples 

The  resultant  cloud  category  map  in  which  each  of  the  1050 
observations  is  identified  by  its  group  label  (from  "1"  to 
"8")  is  shown  in  Figure  3. 

Since  the  areas  chosen  for  wind  velocity  estimation 
usually  consist  of  64x64  data  points,  every  four  sample 
32x32  observations  from  the  1050  sample  observations  were 
combined  together  to  form  255  new  sample  observations 
(each  consisting  of  64x64  data  points)  of  which  243  con- 
tained cloud  data.  The  cloud  category  map  of  Figure  3 
was  then  reduced  to  the  cloud  category  map  of  Figure  4. 

Any  4x4  configuration  in  the  original  cloud  category  map 
of  Figure  3 which  contained  only  I's,  2's,  or  3 ' s and  at 
least  one  2 or  3 was  labelled  as  "L"  for  "low  cloud"  in 
the  cloud  category  map  of  Figure  9.  Any  4x4  combination 
which  contained  one  or  more  6's,  7's,  or  8's  was  labelled 
as  "M"  for  "mixed  cloud".  Any  4x4  combination  which 
contained  one  or  more  4's  was  labelled  as  "Cb"  for  "cumulonim- 
bus cloud".  Any  4x4  combination  which  contained  only  5's 
or  5's  and  l's  was  labelled  as  "Ci"  for  "cirrus  cloud". 

A 4x4  combination  containing  one  or  more  5's  combined 
with  one  or  more  2's  or  3’s  was  also  classified  as  mixed 
cloud.  Finally,  any  4x4  combination  containing  all  l's 
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(no  observable  clouds)  was  not  processed.  The  group  fre- 
quencies for  the  four  cloud  types  of  the  cloud  category 
map  of  Figure  4 are 

(1)  low  clouds  - 86  samples 

(2)  mixed  clouds  - 87  samples 

(3)  cirrus  clouds  - 24  samples 

(4)  cumulonimbus  clouds  - 40  samples 

The  cloud  category  map  of  Figure  4,  together  with  the 
243  sample  pairs  of  visual  and  infrared  64x64  data  points, 
form  the  data  base  used  to  investigate  the  optimal  design 
for  a statistical  cloud  classification  system  for  wind 
velocity  estimation. 
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3 . Feature  Selection 

3.1  First  Order  and  Second  Order  (Textural)  Statistical  Features 


First  order  statistical  features  of  an  image  are  features 
which  describe  or  characterize  the  density  function  p(g)  of  the 
image  gray  levels.  For  comparison  of  finite  images,  each  with 
the  same  number  of  gray  levels,  the  histogram  or  frequency  dis- 
tribution of  gray  level  values  in  the  image  can  be  used  as  the 
basic  function  whose  properties  are  to  be  described  by  a given 
set  of  features. 

For  the  classification  of  cloud  patterns,  the  first  order 
statistical  features  chosen  to  describe  the  infrared  and  visual 
histograms  were  the  mean;  the  standard  deviation;  the  gray 
level  values  with  cumulative  frequency  percentages  of  0%,  10%, 
...,100%;  and  the  differences  between  pairs  of  gray  level 
values  with  cumulative  frequency  percentages  of  0%  and  100%,  10% 
and  90%,  0%  and  50%,  50%  and  100%,  20%  and  80%,  30%  and  70%,  and 
40%  and  60%.  Feature  definitions  for  both  first  order  and 
second  order  statistical  features  may  be  found  in  Appendix  A. 

Second  order  (textural)  statistical  features  of  an  image 
are  features  which  describe  or  characterize  the  joint  density 
function  pp  0(g^,g2)  of  pairs  of  gray  levels  separated  from  each 
other  in  the  orientation  or  direction  9 by  a distance  of  p. 

That  is,  textural  features  characterize  the  spatial  dependency 
of  gray  levels  [10].  For  textures  composed  of  "elements"  (e.g., 
small  pieces  of  relatively  constant  gray  level),  if  p is  small 
compared  to  the  element  size,  then  pp  0(g-|  »g2)  will  tend  to  be 
high  for  | g - g 2 I small,  and  low  otherwise;  thus,  measuring  pp  0 


for  various  values  of  p provides  Information  about  texture 
element  sizes.  For  p large  compared  to  the  element  size,  or  if 
there  are  no  significant  texture  elements,  pp  will 

essentially  be  the  probability  that  two  randomly  chosen  image 
points  have  gray  levels  g1  and  g2,  respectively.  In  such  cases 
second  order  statistics  will  not  provide  any  useful  information 
beyond  that  available  from  first  order  statistics. 

Historically,  textural  features  have  been  used  to  charac- 
terize cloud  types.  Expressions  such  as  "fibrous  appearance" 
have  been  used  to  characterize  cirrus  clouds;  al tocumul us  is 
described  in  terms  of  "regularly  arranged  small  elements  [which] 
usually  have  an  apparent  width  of  between  one  and  five  degrees"; 
cirrocumulus  is  said  to  be  "composed  of  very  small  elements  in 
the  form  of  grains,  ripples,  etc.,  merged  or  separate,  and  more 
or  less  regularly  arranged"  [11]. 

Textural  features  have  been  employed  not  only  in  manual  but 
also  in  automatic  cloud  pattern  analysis  systems.  In  1968, 
Darling  and  Joseph  [12]  published  a classic  paper  on  comparison 
of  various  decision  algorithms  using  a set  of  28  discriminators, 
including  "15  quantities  [designed  to]  measure  the  general  tex- 
tural characteristics  of  the  scene".  Ever  since  then,  texture 
features  have  been  incorporated  into  major  cloud  pattern  analysis 
studies.  In  1972  Booth  [9  ] found  that  for  both  visual  and 
Infrared  images  the  average  digital  gradient,  a textural  feature, 
entered  into  the  discriminant  function  calculation  at  the  1% 
screening  level.  Aggarwal  and  Duda  [13]  state  that  "in  the 
cloud  tracking  problem  there  are  measurable  differences  in 
brightness,  boundary  shape,  and  texture  between  clouds  in  diff- 


erent  layers." 

The  incorporation  of  textural  features  into  this  study  is 
an  attempt  to  investigate  measurable  differences  in  texture  be- 
tween four  classes  of  clouds,  chosen  for  their  relevance  to 
automatic  wind  velocity  estimation  programs.  The  calculation 
of  these  textural  features  is  based  on  areas  which  may  not  and 
often  are  not  uniformly  covered  by  one  particular  cloud  layer. 
Variable  amounts  of  sea  surface  and  other  cloud  types  may  be 
present  in  the  given  imagery.  This  problem  is  inherent  in  any 
regular  partitioning  of  a satellite  picture  into  areas  large 
enough  for  subjective  meteorological  classification  and  may 
cause  degradation  in  the  calculation  of  textural  features.  As 
will  be  seen  in  Section  4.3,  the  results  of  this  study  strongly 
suggest  that  features  such  as  brightness  and  standard  deviation 
are  far  more  significant  in  identification  of  cloud  layers  than 
textural  features. 

The  textural  features  extracted  from  each  of  the  243  cloud 
observations  for  distances  p = 1,  2,  4,  8 and  directions  6 = 0°, 
45°,  90°,  135°  are: 

1)  mean  of  gray  level  difference  values --the  expected 
value  of  the  gray  level  difference,  which  ranges  from 
0 to  255-, 

2)  contrast  --  the  expected  value  of  the  squared  gray 
level  difference  ; 

3)  angular  second  moment  --  the  sum  of  the  squaresof  the 
difference  probabilities; 

4)  entropy  --  the  negative  sum  of  the  products  of  diff- 
erence probabilities  times  their  logarithms; 


5)  statistics  (consisting  of  mean,  standard  deviation, 
minimum,  maximum,  and  range)  for  each  of  quantities 
(1),  (2),  (3),  and  (4)  calculated,  for  a given  distance, 
over  all  four  directions. 

The  definitions  of  the  above  288  second-order  statistical 
features  (144  visual  and  144  infrared)  can  be  found  in  Appendix 
A.  A discussion  of  the  relationship  between  difference  features 
and  other  textural  features  can  be  found  in  Section  2 and 
Appendix  B of  [ 14  ] . 

3 . 2 Class  Comparison  of  Statistical  Measures  of  Individual 

Features 

For  each  of  the  four  cloud  classes  (low,  mix,  cirrus, 
cumulonimbus),  class  means,  standard  deviations,  minimums, 
maximums,  medians,  and  ranges  were  calculated  for  each  feature. 
Large  differences  between  class  mean  values,  coupled  with  small 
class  standard  deviations,  indicate  a feature  likely  to  contri- 
bute to  class  separability.  • If  the  feature  tends  to  be  normally 
distributed,  one  would  expect  the  median  to  closely  approximate 
the  mean.  The  minimum,  maximum,  and  range  values  give  some  idea 
of  the  overlaps  between  feature  values  of  the  classes  being  com- 
pared . 

The  visual  brightness  feature  values,  shown  in  Tables  2-5, 
consistently  reveal  the  same  pattern  of  low  (dark)  values  for 
cirrus  cloud  with  little  variation  between  samples,  brighter 
values  for  low  and  mix  clouds,  and  very  bright  values  for  cumu- 
lonimbus clouds  (as  evidenced  by  the  minimum  value  of  feature 
113  and  the  mean  value  of  feature  115).  A large  overlap  occurs 
between  the  low  and  mix  classes  throughout  the  spectrum  of  visual 


brightness  features  (note  in  particular  feature  117),  with  mix 
clouds  slightly  brighter  in  general  than  low  clouds. 

This  overlap  between  the  low  and  mix  classes  occurs  also  in 
the  values  of  infrared  temperature  features  302  and  314-320, 
shown  in  Tables  6-9.  For  standard  deviation  feature  302  and 
temperature  range  features  314-320,  there  is  little  variation 
in  temperature  values  for  low  clouds,  and  increasing  variation 
for  mix,  cirrus,  and  cumulonimbus  clouds.  The  coldest  (bright- 
est) temperature  values  (feature  303)  belong  to  cumulonimbus 
clouds.  The  next  coldest  values  can  be  found  in  the  mix  and 
cirrus  sample  observations.  Low  clouds  are  generally  warm. 

Second-order  visual  difference  statistics  for  distance  1 
(Tables  10-13)  measure  factors  such  as  the  amount  of  local 
variation  and  the  overall  homogeneity  of  the  images  [10].  The 
mean  texture  features  121-129,  representing  normalized  expected 
values  of  differences  of  neighboring  gray  levels  for  directions 
0°,  45°,  90°,  135°  and  statistics  on  these  expected  values,  are 
lowest  for  cirrus  clouds,  highest  for  cumulonimbus  clouds,  and 
approximately  equal  for  low  and  mix  clouds,  with  slightly 
higher  values  for  mix  clouds.  Similar  remarks  apply  to  the 
contrast  texture  features  130-138,  with  features  130-133  repre- 
senting expected  values  of  squares  of  differences  over  the  four 
directions,  and  the  entropy  texture  features  148-156  which 
measure  the  complexity  of  an  image.  Entropy  feature  values  are 
highest  when  there  are  many  different  difference  values,  i.e., 
for  a more  complex  image,  whereas  a reverse  pattern  can  be  seen 
for  ASM  (angular  second  moment)  features  139-147,  which  are 
lowest  when  there  are  many  different  difference  values.  For  ASM 


features  139-147,  therefore,  the  highest  values  occur  for 
cirrus  clouds  and  the  lowest  values  for  cumulonimbus  clouds, 
with  considerable  overlap  between  the  values  for  low  and  mix 
clouds  (with  minimum  values  for  mix  clouds  lower  than  minimum 
values  for  low  clouds).  The  visual  textural  features  seem  to 
duplicate  the  same  pattern  of  homogeneity  vs.  complexity 
exhibited  by  the  visual  brightness  standard  deviation  feature 
102.  Comparing  all  four  groups  of  textural  features  by  stan- 
dard deviations  relative  to  mean  values,  the  visual  contrast 
features  130-138  in  particular  show  an  unusually  large  scatter- 
ing around  their  mean  values  compared  to  the  visual  entropy 
features  148-156. 

The  second-order  infrared  difference  statistics  for  dis- 
tance 1 (Tables  14-17)  measure  local  variations  in  temperature. 
The  lowest  values  (smallest  temperature  variations)  occur  for 
low  clouds  and  the  highest  values  for  the  dense  thunderstorm 
cumulonimbus  clouds.  Local  temperature  variation  is  greater  for 
cirrus  clouds  than  for  mix  clouds.  This  result  can  partially  be 
explained  by  differences  in  emissivity  values  of  the  thin  and 
dense  portions  of  the  cirrus  clouds  (with  warmer  temperatures 
for  the  more  emissive,  thin  portions)  and  partially  by  the 
method  of  classifying  a sample  as  "mix"  cloud.  (If,  for  example, 
a minimum  of  1/4  of  the  sample  consisted  predominantly  of  multi- 
layered cloud  regions  and  the  other  3/4  of  the  sample  was  pre- 
dominantly low  cloud,  the  sample  was  classified  as  "mix".)  In 
many  "mix"  samples,  a large  portion  of  the  sample  contained 
relatively  homogeneous  low  clouds.  The  overlap  between  mix  and 
cirrus  feature  values  was  particularly  evident  for  the  ASM 


features  339-347  (in  fact,  for  feature  339,  mix  clouds  appear 
slightly  more  homogenous  than  cirrus  clouds).  The  angular 
second  moment  features  show  the  same  reverse  pattern  of  the  mean, 
contrast,  and  entropy  features  as  noted  in  the  previous  para- 
graph with  the  mean,  contrast,  and  entropy  texture  features 
duplicating  the  temperature  variation  pattern  of  the  infrared 
standard  deviation  feature  302. 


. 

3 . 3 Results  of  Application  of  the  Fisher  Distance  Criterion  to 
the  Feature  Selection  Problem 

The  problem  of  selection  of  features  for  class  separability 
consists  of  choosing,  at  each  stage  s^  of  a classification  de- 
cision, that  set  of  K features  which  contributes  most  effec- 

si 

tively  to  the  final  discrimination  of  cloud  classes.  In  order 
to  measure  class  separability,  the  underlying  class  distribu- 
tions must  be  assumed  and  a given  classification  decision  logic, 
including  type  of  classifier  at  each  level  and  classes  of  cloud 
patterns  to  be  distinguished  at  each  level,  must  be  specified. 

If  the  classifier  chosen  is  the  Bayes  classifier  (rather  than, 
for  example,  a nearest  neighbor  classifier),  the  Bayes  probabi- 
lity of  error  is  the  optimum  measure  of  feature  effectiveness. 

For  two  classes  w-j  , w2  the  Bayes  classifier  (decision  rule)  for 
one-dimensional  (single  feature)  feature  vectors  is: 


Decide  w,  if  p(x/wl ) s P(w2J 
p(x/w2)  P(w1 ) 


Decide  w2  if 


P(x/Wj)  P(w2) 

P(x/w2)  < P(w1  ) 


where  p(x/w. ) for  1*1,  2 are  class-conditional  probability 


densities  and  P(w^)  for  i=l,2  are  a priori  probabilities.  Let 


p(x/w,)  P(w2) 

R-|  be  the  region  in  which  -p-(x/-w  ) > F(wT  ’ i,e*’  the  re9ion 
in  which  the  decision  w1  is  made,  and  let  R2  be  the  region  in 


which 


P ( X / W -j  ) P(w2) 

p(x/w2)  < P(w1 ) 


, i.e.,  the  region  in  which  the  de- 


decision w2  is  made.  The  threshold  t between  R-j  and  R2  for  two 
normal  distributions  with  means  y^  and  y2  and  with  equal  stand- 
ard deviations  o is  found  by  setting 

pU/w^  P(w2) 

p(t/w2)  " P(w] ) 

Assuming  equal  a priori  probabilities  P(w-j)  = P(w2)  = j and 

substituting  the  expression  for  the  univariate  normal  densities, 
it  follows  that 

1 (t-Vh  )2 

, 7 — 

-3-  e 


/2tt0 


/2ira 


_1( t-y ? ) ' 

a ^ 


and  taking  natural  logarithms  on  both  sides  of  the  equation,  it 
follows  that 

t2-2y-j  t+y -j  Z = tZ-2y2t+y22 


y2  + ul 


i.e.,  the  threshold  lies  midway  between  the  means.  If  R2  is  the 


region  to  the  right  of  R1  , i.e.,  y?  > y1  , then  the  Bayes  proba- 
bility of  error  is  given  by 

P(error)  = P(x  e R2,  w-| ) + P(x  e R-j  , w^) 


= /°°p(x/w1  ) P(w1  )dx  + /tp(.x/w2)P(w2)dx 
t -00 

1 x - y -I  o 


If00  e 

2i  /27ro 


dx  + \^y^ae 


1 x - y « p 
2(~)2 


dx 


which,  substituting  u = (x-y^)/o,  du  = dx/o,  v = -(x-y2)/o, 
dv  = - x / a , gives 


\r  1 

t-y 


/2tt 


a 

which,  substituting  for  t,  changing  the  order  of  integration  of 
the  second  integral,  and  collecting  terms,  yields 


J j°° 

/?¥  y9-y 


4y2 

e c dy 


2o 


^2~^1 

Note  that  the  larger  the  value  of  the  integration  limit  — 2^- 

the  smaller  the  probability  of  error. 

The  Fisher  distance  feature  selection  criterion  for 

evaluation  of  the  ability  of  a single  feature  to  separate 

i y 2 _ M 1 1 

classes  w,  and  w0  is  given  by  1 — - — 1 where  y . ,a.  are  the  mean 

• c o -j  i’(72  ■ i 

and  standard  deviation  of  the  feature  for  samples  in  class  v/- , 
i=l,2.  For  normal  distributions  and  equal  standard  deviations, 
the  Fisher  distance  is  an  optimum  feature  selection  criterion. 


i.e.,  it  is  monotonica  1 ly  related  to  the  probability  of  error 
as  shown  in  the  previous  paragraph.  Except  for  the  equal  co- 
variance  case,  even  for  normal  distributions,  the  calculation 
of  the  probability  of  error  requires  a numerical  integration. 
Various  alternatives  to  calculation  of  the  probability  of  error 
have  been  proposed.  Most  of  these  feature  selection  criteria, 
for  example,  the  Fisher  distance  feature  selection  criterion, 
are  designed  to  increase  as  the  be tween -cl  ass  scatter  increases 
or  the  within-class  scatter  decreases. 

The  extension  of  the  feature  selection  algorithms  to  multi- 
class (more  than  two)  separability  and  mul ti -feature  separabi- 
lity is  unfortunately  not  simple.  According  to  Fukunaga  [15], 
"no  single  criterion  can  be  particularly  indicative  of  multi- 
class separability".  Sometimes  an  upper  bound  on  the  probabi- 
1 i ty  of  error  is  used.  Theoretical  relationships  between  the 
best  pair,  etc.  of  features  for  class  separability  and  the  best 
single  features  for  class  separability  are  lacking.  This  prob- 
lem is  discussed  in  detail  in  Kanal  [16]  where  it  is  stated 
that  "the  only  way  to  ensure  that  the  best  subset  of  k features 
from  a set  of  N is  chosen  is  to  explore  all  (k)  possible  com- 
binations". In  practice,  however,  a feature  selection  criterion 
for  single  features  is  often  used  to  discard  the  worst  features 
and  subsequent  trials  of  combinations  tend  to  concentrate  on 
adding  a feature  to  those  which  performed  well  in  the  next 
lower-dimensional  feature  vector  space. 

The  Fisher  distance  feature  selection  criterion  was 


applied  first  to  the  problem  of  deciding  whether  distance  1, 
distance  2,  distance  4,  or  distance  8 second-order  textural 
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statistics  were  most  effective  in  discriminating  between  pairs 
of  cloud  classes.  Fisher  distance  values  for  visual  differ- 
ence features  121-264  are  given  in  Tables  20-23  for  distances 
1,  2,  4,  8,  respectively,  and  Fisher  distance  values  for  infra- 
red difference  features  321-464  are  given  in  Tables  24-27  for 
distances  1,  2,  4,  8.  For  each  of  the  difference  features  and 
for  each  of  the  six  two-class  combinations,  a comparison  of 
the  values  of  the  Fisher  distance  for  distance  1,  distance  2, 
distance  4,  and  distance  8 was  made.  A final  tally  revealed 
that  distance  1 was  the  best  overall  choice.  Distance  8 infra- 
red difference  features  performed  slightly  better  than  distance 
1 features  for  separation  of  low  and  mix  clouds.  Distance  8 
infrared  difference  features  were  also  useful  for  separation 
of  low  from  cirrus  clouds  and  distance  8 visual  difference 
features  were  useful  for  separation  of  low  from  cumulonimbus 
clouds  and  low  from  mix  clouds.  This  suggests  that  a typical 
difference  pattern  between  elements  of  low  clouds  prevails  over 
a large  area. 

The  values  of  the  Fisher  distances  in  Tables  20-27  can 
also  be  used  to  verify  which  channel  best  separates  each  of  the 
six  two-class  combinations.  The  infrared  channel  is  to  be  pre- 
ferred for  any  two-class  combination  containing  low  clouds. 
Differences  between  the  cirrus  cloud  observations  and  observa- 
tions of  either  mix  clouds  or  cumulonimbus  clouds  are  more 
pronounced  in  the  visible  image  than  in  the  infrared,  where 
cirrus  clouds  appear  dark  gray  and  mix  and  cumulonimbus  bright. 
Fisher  distance  values  for  separation  of  mix  from  cumulonimbus 
clouds  are  approximately  equal  for  both  infrared  difference 


features  and  visual  difference  features,  with  infrared  entropy 
features  slightly  better  than  the  others. 

Evaluation  of  distance  1 features  for  each  of  the  two- 
class  combinations  pointed  to  the  superiority  of  the  infrared 
entropy  features  348-356  for  separation  of  low  clouds  from 
, either  mix,  cirrus,  or  cumulonimbus  and  for  separation  of  mix 

from  cumulonimbus  clouds.  Visual  entropy  features  148-156 
performed  well  in  discrimination  of  cirrus  from  cumulonimbus 
and  from  mix  clouds,  with  visual  ASM  features  slightly  better 
than  visual  entropy  features  148-156  or  visual  mean  features 
121-129  for  separation  of  cirrus  from  mix.  Infrared  mean 
features  321-329  were  second-best  to  infrared  entropy  features 
for  the  two-class  problems  of  low  vs.  mix,  low  vs.  cirrus,  and 
mix  vs.  cumulonimbus,  with  infrared  ASM  features  339-347  second 
best  for  the  low  vs.  cumulonimbus  combination.  Visual  ASM 
features  139-147  were  second-best  for  cirrus  vs.  cumulonimbus. 
As  a result  of  the  analysis  of  Tables  20  and  24,  it  was  decided 
to  discard  the  infrared  and  visual  contrast  features  330-338 
and  130-138  and  to  discard  the  statistical  features  on  the 
directions  of  the  second-order  textural  features,  since  there 
seemed  to  be  no  specific  improvement  in  the  Fisher  distance 
values  of  the  statistical  features  over  the  Fisher  distance 
values  of  the  basic  second-order  features.  The  second-order 
textural  features  which  were  retained  as  input  for  various 
classification  designs  were  mean  features  121-124  and  321-324, 
ASM  features  139-142  and  339-342,  and  entropy  features  148-151 
and  348-351. 

The  highest  of  the  Fisher  distance  values  for  first-order 


statistical  features,  shown  in  Tables  18  and  19,  were  signifi- 
cantly larger  than  the  highest  values  for  the  textural  features 
(Tables  20-27)  for  all  two-class  combinations  except  mix  vs. 
cirrus,  where  horizontal  visual  ASM  feature  139,  horizontal 
visual  mean  feature  121,  and  horizontal  visual  entropy  feature 
138  performed  better  than  any  first-order  visual  statistical 
feature.  The  highest  Fisher  distance  value  for  separation  of 
mix  fr  m cirrus  was  found  in  Table  19  for  feature  number  313 
which  epresents  sea - surface  temperature.  However , incorpora- 
tion of  any  particular  feature  into  classification  design  must 
be  predicated  on  knowledge  of  the  meteorological  significance 
of  the  feature.  The  tendency  of  the  cirrus  samples  to  cluster 
in  a particular  geographical  region  contributed  to  the  ability 
of  the  infrared  feature  313  (see  Table  19)  to  discriminate 
cirrus  from  either  low  clouds,  mix  clouds,  or  cumulonimbus 
clouds.  After  discarding  feature  313,  it  can  be  seen  that  the 
visual  feature  114,  representing  the  difference  between  the 
darkest  point  and  the  brightest  point  in  the  visible  image,  is 
the  best  single  individual  feature  for  separation  of  cirrus  from 
mix  as  well  as  cirrus  from  cumulonimbus  and  mix  from  cumulonimbus. 
For  separation  of  low  clouds  from  mix,  from  cirrus,  or  from 
cumulonimbus,  the  best  single  feature  is  infrared  feature  314, 
representing  the  difference  between  the  warmest  and  the  coldest 
temperature  in  the  infrared  image.  In  addition  to  features  114 
and  314,  features  108,  113,  115,  116,  117,  118,  302,  303,  308, 

315,  316,  317,  and  318  were  retained  for  further  processing  in 
the  classification  design  phase.  Features  465-470,  based  on 
analysis  of  quadrants  of  infrared  images,  were  calculated  sub- 
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sequent  to  initial  classification  design  trials  in  an  attempt 
to  separate  low  clouds  from  mix  clouds.  Fisher  distances  for 
features  465-470  are  recorded  in  Table  19  to  facilitate  com- 
parison with  previously  computed  features. 


4.  Design  and  Evaluation  of  Cloud 


4 . 1 Construction  of  Classification  Decision  Logic 

The  design  and  evaluation  of  cloud  classification  systems 
for  both  the  four  class  problem  (separation  of  low,  mix,  cirrus, 
and  cumulonimbus  clouds)  and  the  three  class  problem  (separation 
of  low,  cirrus,  and  cumuml onimbus  clouds)  was  accomplished  by 
interactively  processing  features  which  were  retained  during 
the  feature  selection  phase  through  various  decision  logic 
structures  created  by  selecting  specific  options  available  on 
the  University  of  Maryland  Interactive  Pattern  Analysis  and 
Classification  System  (MIPACS).  The  present  implementation 
version  of  MIPACS,  described  in  [17],  offers  the  following 
options  at  each  level  of  the  decision  process: 

1)  number  of  user  classification  categories 

N ^ i ,N ^ , . . . ,N^m  to  be  used  in  design  of  classifier  at 
the  given  level  Z, 

2)  choice  of  which  sample  cases  are  to  be  inserted  into 

each  of  the  categories  , . . . , , 

3)  choice  of  which  features  from  the  sample  feature 
vector  are  to  be  used  for  the  design  of  the  classifier 
at  level  Z.  and 

4)  choice  of  type  of  statistical  classifier  to  be  used  at 
level  Z. 

The  data  file  prepared  by  the  user  of  MIPACS  consists  of  a set 
of  cases  (or  samples)  in  which  each  case  number  is  associated 
with  a case  feature  vector.  The  case  number,  for  example, 
could  refer  to  a given  window  of  a satellite  picture  and  the 


case  feature  vector  could  contain  statistics  such  as  the  mean 
visual  gray  level,  mean  infrared  gray  level,  standard  deviation 
of  the  visual  gray  levels  in  the  window,  and  standard  deviation 
of  the  infrared  gray  levels.  The  classification  function 
("CLASSIFY")  of  MIPACS  processes  every  sample  which  is  an 
element  of  a terminal  node  (terminal  user  classification  cate- 
gory) of  the  classification  decision  tree  through  the  classi- 
fiers stored  at  each  level  of  the  classification  design  until 
the  sample  arrives  at  a terminal  bin.  The  sample  is  then  said 
to  be  an  element  of  a terminal  automatic  classification  cate- 
gory. 

Typically,  the  terminal  user  classification  categories 
correspond  to  classes  of  samples  labelled  by  experts  in  the 
particular  discipline  from  which  the  samples  were  drawn.  A two- 
level  decision  process  in  which  the  three  terminal  classes  con- 
sist of  samples  labelled  respectively  by  meteorologists  as  "low" 
cloud,  "high"  cloud,  and  "cumulonimbus"  cloud  could  be  specified 
by  any  one  of  the  three  distinct  decision  trees  below: 


The  tree  on  the  left  has  the  two  levels  shown  below 
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Classes  N^,  N-|2  and  N^  , N22  denote  the  classes  of  samples  used 
to  train  the  classifiers  at  Decision  Node  N-j  and  Decision  Node 
N2  respectively.  The  classes  A^,  A21  , and  A22  denote  the 
names  of  terminal  bins  into  which  a sample  can  fall  after  being 
processed  through  the  "CLASSIFY"  function  of  MIPACS.  From  each 
decision  node  N^  and  N2  emanate  two  branches,  meaning  that  the 
classification  design  is  based  on  two  classes  at  each  level. 

At  level  1,  the  design  of  the  classifier  might  be  based 
on  inserting  into  class  N^  all  sample  cases  labelled  by 
meteor  1 ogi sts  as  "low"  cloud  and  on  inserting  into  class  N^2 
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(USER  CLASSIFICATION  CATEGORIES) 


all  sample  cases  labelled  as  either  "high"  cloud  or  "cumulonim- 
bus" cloud.  At  level  2,  the  design  of  the  classifier  might  be 
based  on  class  N^  , consisting  of  all  samples  labelled  as  "high" 
cloud,  and  on  class  N 22 » consisting  of  all  samples  labelled  as 
"cumulonimbus"  cloud.  For  level  1,  the  infrared  mean  gray 
level  and  infrared  standard  deviation  of  gray  levels  might  be 
the  features  chosen  to  separate  the  class  of  low  clouds  from 
the  class  of  all  high  and  cumulonimbus  clouds.  For  level  2, 
the  visual  mean  gray  level  might  be  chosen  to  separate  the 
class  of  high  clouds  from  the  class  of  cumulonimbus  clouds. 

A maximum  likelihood  classifier  could  be  selected  by  the 
user  of  MIPACS  at  level  1 and  a Fisher  linear  discriminant 
could  be  chosen  for  level  2.  The  "CLASSIFY"  function  of  MIPACS, 
operating  at  Decision  Node  N-j  , would  process  every  low  cloud 
sample,  every  high  cloud  sample,  and  every  cumulonimbus  cloud 
sample  sequentially  through  Decision  Node  N-j  and  Decision  Node 
N2  (should  the  sample  arrive  at  that  node  as  a result  of  the 
maximum  likelihood  classifier  stationed  at  Decision  Node  ) 
until  the  sample  dropped  into  one  of  the  terminal  bins  designat- 
ed by  A^  , A^,  or  A22.  A confusion  matrix  of  the  form 
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would  then  be  flashed  to  the  user  of  MIPACS.  Interpretation 
of  the  figures  in  row  1 would  be  that  out  of  13  samples 
labelled  by  meteorologists  as  "low"  cloud,  the  classification 
decision  logic  (specified  in  terms  of  choice  of  decision  tree 
structure  and  choice  of  features  and  type  of  classifier  for 
each  level)  had  classified  10  samples  as  low  cloud,  2 samples 
as  high  cloud  (Type  I error),  and  1 sample  as  cumulonimbus 
cloud  (Type  I error).  Interpretation  of  the  figures  in  column 
1 indicate  that  10  low  cloud  samples  were  classified  as  low 
cloud,  4 high  cloud  samples  as  low  cloud  (Type  II  error),  and 
3 cumulonimbus  cloud  samples  as  low  cloud  (Type  II  error). 
Similar  remarks  apply  to  rows  2 and  3 and  columns  2 and  3. 

4 . 2 Description  of  Selected  Cloud  Classification  Structures 
In  addition  to  the  single  stage  decision  structures 
classically  employed  in  pattern  recognition  techniques,  a 
variety  of  multistage  decision  tree  structures  were  designed. 
Multistage  decision  tree  classifiers,  according  to  Wu  [18], 

"have  the  potential  for  improving  the  classification  accuracy 
and  the  computation  efficiency"  of  single  stage  classifiers. 

Wu  [18]  notes,  however,  that  theoretically  "the  conventional 
[single  stage,  maximum  likelihood]  procedure  with  the  complete 
feature  set  is  optimal  in  accuracy".  The  potential  for  improve- 
ment in  accuracy  offered  by  multistage  decision  trees  stems 
from  the  problem  of  dimensionality. 

As  the  number  of  features  increases,  the  dimensionality 
problem  is  said  to  occur  at  the  point  when  error  involved  in 


density  estimation  increases  faster  than  class  separability. 

The  dimensionality  problem  results  from  limited  numbers  of 
training  samples.  Single-stage  classifiers  which  require  large 
numbers  of  features  for  one-shot  multiclass  separability  are 
more  susceptible  to  the  problem  of  dimensionality  than 
sequential  multistage  decision  tree  classifiers  which  often 
require  fewer  features  at  each  node.  However,  multistage  de- 
cision tree  classifiers  suffer  from  the  problem  that  at  a 
given  level,  there  may  be  one  or  more  mixture  classes  which  do 
not  satisfy  the  assumption  of  normality  typically  assumed  by 
statistical  classifiers. 

Just  as  the  problem  of  feature  selection  cannot  be  simpli 
fied  to  selection  of  the  best  single  features  in  one- 
dimensional space,  the  problem  of  design  of  a multiclass 
binary  decision  tree  skeleton  cannot  be  simplified  to  finding 
the  best  two-class  combination  for  each  decision  level. 

Suppose,  for  example,  that  one  could  determine  that  a given 
set  of  features  and  classifier  could  separate  cirrus  from 
cumulonimbus  clouds  better  than  any  set  of  features  or  classi- 
fier could  separate  any  other  pair  of  cloud  classes.  It  would 
then  seem  logical  to  design  a binary  tree  skeleton  for  the 
four-class  problem  as  follows:  Level  3 would  involve  a de- 

cision between  cirrus  and  cumulonimbus  clouds.  Level  2 would 
involve  a decision  between  whichever  one  of  the  remaining 
classes  (low  or  mix)  was  best  separated  from  the  combination 
class  of  cirrus  and  cumulonimbus  clouds,  etc.  Conversely, 
one  might  start  top-down  and  find  which  one  of  the  four 


classes  was  best  separated  from  a combination  class  of  the 
other  three.  Unfortunately,  this  procedure  fails  to  produce 
the  optimal  binary  tree  design  for  the  specified  number  of 
levels  and  nodes  because  there  is  no  simple  relationship  be- 
tween the  overall  tree  performance  and  the  individual  classifi- 
cation performances  at  decision  nodes. 

Even  for  statistically  independent  features,  Kulkarni  and 
Kanal  [19]  have  shown  that  optimizing  the  average  correct 
recognition  rate  at  each  node  of  a decision  tree  does  not 
necessarily  optimize  the  total  tree  performance.  The  total 
tree  performance  PC(T)  for  separation  of  four  classes  w-|  , w2, 

W3 , w4  is  given  by 

4 

PC(T)  = l P(w.)  P (wj 
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where 

P(w. ) is  the  a priori  probability  of  class  wi 
P c ( w i ) Pr°bability  of  correct  recognition  for 

class  wi . 

For  statistically  independent  features 

P c ( w i ) = Pc ( wj /Node  Np 1 ) •Pc(wi/Node  Np2 ) • • • Pc (wi /Node  Npm) 

where 

Nodes  Np^ , Np2»...,Npm  are  nodes  along  a path  from  the 
root  of  the  tree  leading  to  the  terminal  node  w^ 

Pr(w./N  .)  is  the  probability  of  correct  recognition  of 
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The  average  correct  recognition  rate  p c ( N p j ) at  Node  Npj  is 
given  by 
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where  S is  the  set  of  all  terminal  nodes  below  N 


PJ 


It  can  be 


seen  that  optimizing  Pc(Npj)  does  not  necessarily  optimize 
P ( T ) , which  involves  products  of  terms  of  the  form  P (w./N  •) 

C Cl  pj 

instead  of  linear  combinations. 

Kulkarni  and  Kanal  [19]  also  prove  that  the  optimal 
feature  assignment  at  each  node  of  a decision  tree  using  a 
maximum  likelihood  rule  does  not  necessarily  result  in  the 
optimal  overall  feature  assignment  even  for  statistically  in- 
dependent features.  Wu  [18]  mentions  that  "there  are  basically 
two  problems  in  optimizing  the  performance  of  a decision 
tree...  the  complexity  of  the  tree  structure  [and  the  fact 
that]  the  overall  performance  of  proposed  classifier  structure 
cannot  be  predicted  exactly". 

Although  there  is  no  optimal  method  of  designing  a tree 
skeleton,  selecting  feature  sets  for  each  decision  node,  or 
selecting  a classifier  for  each  decision  node  other  than  ex- 
haustive search,  various  suboptimal  techniques  are  employed. 
Often  the  best  feature  subset  in  a lower-dimensional  space  is 
combined  with  another  feature  when  increasing  feature  dimen- 
sionality. Tree  design  is  often  predicated  on  knowledge  of 
the  problem  domain  coupled  with  histogram  or  sequential 
clustering  approaches  (Wu  [18]). 


histograms  of  individual  features  for  each  of  the  four 
classes  were  examined  to  determine  whether  or  not  there  was  any 
obvious  design  strategy  which  would  necessitate  more  than  four 


terminal  nodes  (i.e.,  more  than  one  decision  path  for  a given 
class).  However,  even  for  mix  samples,  there  seemed  no  obvious 
class  partition.  Hence,  it  was  decided  to  limit  the  design  of 
the  tree  skeleton  to  trees  with  four  terminal  nodes,  represent- 
ing the  four  cloud  classes,  and  also  to  consider  only  binary 
trees,  i.e.,  to  simplify  the  decision  at  each  of  the  decision 
nodes  to  a two-way  branch.  This  reduced  the  number  of  possible 
distinct  trees  to  fifteen,  three  of  the  form  A and 
twelve  of  the  form 


From  these  fifteen  trees,  four  were  selected  which 
appeared  likely  to  offer  optimal  performance  at  one  or  more 
stages  of  the  decision  tree  structure.  Although  it  was  pre- 
viously mentioned  that  maximizing  classification  performance 
at  each  individual  node  does  not  necessarily  maximize  overall 
tree  perfo  rmance , the  suboptima  1 procedure  of  designing  a tree 
stage  by  stage  is  often  employed.  A presentation  of  a search 
procedure  which  incorporates  this  stage  by  stage  design  con- 
cept can  be  found  in  Section  4.3  of  Wu  [18].  From  a cursory 
scan  of  the  mean  values  of  several  individual  visual  bright- 
ness features,  it  could  be  seen  that  there  was  good  class 
separability  between  cirrus  and  cumulonimbus.  Three  of  the 
four  decision  trees  selected,  trees  3-5  shown  in  Figures  7- 
9 , at  one  stage  of  the  decision  process  involved  a separation 
between  cirrus  and  cumulonimbus  clouds.  The  fourth  tree, 


I 

I 


decision  tree  6 shown  in  Figure  10,  involved  separation 
between  cumulonimbus  clouds  and  the  remaining  three  cloud 
classes.  It  was  found  for  various  feature  subsets  that  maximal 
one  class  vs.  the  rest  separation  was  obtained  when  isolating 
cumulonimbus  clouds  from  a mixture  of  the  other  three  cloud 
classes. 

Decision  trees  7 and  8,  shown  in  Figures  11  and  12,  for 
the  three-class  problem  were  designed  after  initial  four-class 
experiments  indicated  the  problem  of  recognizing  mix  clouds. 
Decision  tree  2 (Figure  6 ) resulted  from  an  attempt  to  correct 
the  confusion  between  mix  and  low  clouds  which  was  the  largest 
single  source  of  classification  error  for  the  single  stage 
maximum  likelihood  classifier  of  decision  tree  1,  shown  in 
Figure  5. 

The  maximum  number  of  features  used  to  separate  two  or 
more  classes  of  multivariate  normally  distributed  satellite 
data  was  determined  by  consideration  of  various  theoretical 
and  experimental  results  relating  sample  size,  classification 
accuracy  on  an  independent  test  set,  and  feature  dimensionality. 
Two  questions  must  be  examined  in  this  connection: 

1)  For  a fixed  number  of  training  samples,  what  is  the 
maximum  number  of  features  for  which  the  estimate 
of  the  classification  error  on  a design  set  is  a 
reliable  predictor  of  the  expected  error  on  an  in- 
dependent test  set? 


2)  For  a given  problem  domain  and  fixed  sample  sizes, 
what  is  the  optimum  feature  dimensionality  in  the 
sense  that  as  the  number  of  features  increases  be- 


yond  this  optimum,  the  experimental  classification 
error  on  independent  test  sets  tends  to  increase? 
Several  theoretical  results,  summarized  in  Kanal  [16],  can  be 
applied  to  the  solution  of  the  first  question.  Foley  [20] 
found  that  for  multivariate  normal  distributions  with  equal 
known  covariance  matrices  and  estimated  mean  vectors,  the  ratio 
of  number  of  training  samples  per  class  to  number  of  features 
should  be  at  least  three  to  one.  If  the  covariance  matrices 
are  equal  but  estimated  from  samples,  Mehotra  [21]  recommended 
a minimum  ratio  of  five  to  one.  Fukunaga  and  Kessel  [22] 
suggested  that  the  ratio  of  total  number  of  samples  to  number 
of  features  should  be  at  least  ten  for  the  two-class  equal 
covariance  problem  and  greater  than  ten  for  the  unequal  co- 
variance  problem.  Experimental  results  of  Fu  et  a 1 . [ 23 ] 
illustrated  that  for  mul ti spectral  classes  of  remote  sensing 
data,  the  optimum  feature  dimensionality  was  between  three  to 
five  features  for  experiments  involving  400  training  samples 
per  class  and  test  sets  of  over  14,000  samples.  Wu  [18]  uses 
a maximum  feature  dimensionality  of  four  in  most  of  his  ex- 
periments on  mul ti spectral  class  separability. 

In  accordance  with  the  criterion  of  a ten  to  one  ratio 
for  number  of  samples  to  number  of  features,  the  maximum  num- 
ber of  features  selected  for  classification  of  cloud  pattern 
classes  was  seven,  since  the  total  number  of  samples  in  the 
smallest  two  classes  (cirrus  and  cumulonimbus)  was  24+46  = 70. 
From  experiments  on  MIPACS,  there  appeared  to  be  a degradation 
in  classification  results  as  the  number  of  features  increased 


from  five  to  six,  suggesting  that  the  maximum  number  of 
features  for  effective  discrimination  of  cloud  patterns  is 
close  to  five  --  a result  similar  to  the  experimental  observa 
tions  of  Wu  [18]  on  land  use  categories. 


4 . 3 Evaluation  of  Cloud  Classification  Systems 

For  selected  combinations  of  features,  classifiers, 
and  decision  tree  (Figures  5-12)  the  percentages  of  samples 
correctly  classified  (per  class  and  per  sum  total)  are 
presented  in  Tables  28-43.  Confusion  matrices  correspond- 
ing to  a given  experiment  number  and  given  table  number 
can  be  found  in  Appendix  C.  Each  experiment  within  a 
given  table  for  Tables  32-43  was  representative  of  a 
collection  of  similar  interactive  experiments  in  which 
given  features  were  interchanged  with  other  features 
based  on  the  same  histogram.  For  example,  feature  113 
from  the  visual  brightness  histogram  might  have  been  sub- 
stituted for  feature  114. 

The  maximum  likelihood  classifier  performed  consis- 
tently better  for  the  four-class  problem  than  either  the 
multi  class  voting,  multi  class  one-against-the-rest,  and 
Fisher  classifier  with  sample  a priori  probabilities.  The 
assumption  of  equal  covariance  matrices  used  in  the  latter 
three  classifiers  proved  too  restrictive  for  separation 
of  the  four  cloud  classes  of  low,  mix,  cirrus,  and  cumulo- 


nimbus. However,  for  the  three-class  problem  (low, 

' 

cirrus,  and  cumulonimbus),  accuracy  greater  than  94%  was 
obtained  by  all  four  types  of  classifiers. 

For  the  four-class  problem,  a comparison  of  Experi- 
ment 1 in  Table  43  with  Experiment  1 in  Table  33  shows  a 
drop  from  36%  classification  accuracy  to  32%  when  a 
single-stage  multiclass  voting  classifier  was  used  in- 
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stead  of  a single-stage  maximum  likelihood  classifier.  A 
comparison  of  Experiment  2 in  Table  43  with  Experiment  1 
in  Table  40  shows  a drop  from  35%  to  79%  classification 
accuracy  when  multiclass  voting  classifiers  were  used  at 
each  stage  of  a multistage  decision  process  instead  of 
maximum  likelihood  classifiers,  and  a similar  drop  to  33% 
for  the  Fisher  classifier  with  sample  a priori  probabili- 
ties. From  Experiment  5 of  Table  43,  it  can  be  seen  that 
only  28%  of  the  total  number  of  samples  were  correctly 
classified  when  a multiclass  one-agai nst-the-rest  classi- 
fier was  substituted  for  a single-stage  maximum  likelihood 
classifier.  The  high  reject  rate  of  the  multiclass  one- 
agai  nst- the-rest  classifier  for  the  four-class  problem 
contrasted  with  the  accurate  performance  for  the  three- 
class  problem  illustrates  the  ambiguity  introduced  into 
the  pattern  analysis  problem  when  non-uni formly  covered 
cloud  areas'  are  to  be  identified. 

The  maximum  feature  dimensionality  for  classification 
of  cloud  patterns  based  on  the  limited  number  of  training 
samples  within  a particular  orbit  can  be  seen  from  the 
single-stage  maximum  likelihood  classification  results  in 
Tables  28-35  and  Table  41  to  be  approximately  5 or  less. 
For  a feature  dimensionality  of  1,  visual  brightness 
features  (Table  23)  classified  approximately  46%-53%  of 
the  samples  correctly;  visual  difference  features  (Table 
29)  classified  approximately  40%-47%  correctly;  infrared 
temperature  features  (Table  31),  60%-73%;  and  infrared 


I 


: 


11  — ^ 

difference  features  (Table  32),  61  % - 6 8% . Infrared  tem- 

perature features  were  obviously  the  best  group  of 
features,  followed  by  infrared  texture  features.  For 
feature  combinations  of  three  and  four  features  (Table 
32),  classification  results  improved  to  approximately  84% 
and  36%  respectively.  Another  2%  increase  in  classifica- 
tion accuracy  to  88%  occurred  when  the  feature  dimension- 
ality was  increased  to  5.  However,  for  various  combina- 
tions of  six  features  (Table  34),  classification  accuracy 
did  not  increase  beyond  88%  and  only  for  a few  select  com 
binations  of  seven  features  (see  Experiment  2,  Table  35) 
did  classification  accuracy  increase  to  89%.  This  means 
that  no  major  increase  in  classification  accuracy  beyond 
the  five  feature  combinations  was  achieved  until  a two- 
stage  classification  process  (see  Table  36)  for  reducing 
the  number  of  mix  samples  incorrectly  classified  as  low 


samples  was  designed. 

Accuracy  was  increased  to  91.4%  by  combining  a seven- 
feature  four-class  maximum  likelihood  classifier  at  level 
1 of  the  classification  process  with  a six-feature  two- 
class  (low  vs.  mix)  maximum  likelihood  classifier  at  the 
, second  stage.  Five  of  the  six  features  which  reduced  the 
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confusion  between  low  and  mix  at  the  second  level  of  the 
classification  process  were  quadrant  features,  which  were 
extracted  in  a crude  attempt  to  predict  how  successful 
features  which  compared  segments  of  a sample  would  be  in 
separating  uniformly  covered  from  non- uni formly  covered 
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cloud  regions.  The  classification  power  gained  from 
these  simple  features  can  also  be  seen  by  comparing  the 
results  of  Experiments  2 and  3,  Table  40,  with  the  results 
of  Experiment  1,  Table  40. 

An  analysis  of  the  sets  of  experiments,  table  by 
table,  for  Tables  28-36  leads  to  the  following  conclusions. 
The  best  single  visual  brightness  feature  (Table  28)  was 
feature  113,  the  brightest  point.  There  was  little  diff- 
erence in  the  performance  of  the  visual  texture  features 
(Table  29),  with  a slight  preference  for  the  diagonal 
directions.  The  best  single  infrared  temperature  features 
(Table  30)  were  whole  sample  and  quadrant  features  in- 
volving determination  of  the  coldest  temperature,  ranges 
between  the  coldest  temperature  and  other  points,  and 
standard  deviation.  For  infrared  texture  features  (Table 
31),  the  entropy  features  as  a group  proved  superior  to 
mean  or  angular  second  moment  (ASM)  features. 

Comparison  of  Experiments  2 and  4 of  Table  32  with 
Experiment  3 of  Table  32  shows  that  at  least  one  visual 
feature  murt  be  included  for  classification  of  cirrus  and 
cumulonimbus  clouds.  For  combinations  of  three  features 
including  one  visual  feature  and  tv;o  infrared  features, 
similar  results  were  obtained  in  Experiments  1 and  2 for 
one  infrared  temperature  feature  combined  with  one  infra- 
red texture  feature  and  two  infrared  temperature  features. 
One  would  have  suspected  prior  to  conducting  the  design 
experiment  that  the  three-feature  combination  with  the 
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texture  feature  would  have  given  better  results  provided 
that  the  infrared  texture  feature  was  not  highly  correlat- 
ed with  the  infrared  temperature  feature. 
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Experiments  3-7  of  Table  32  illustrate  the  effect  of 
leaving  out  one  feature  from  the  five- feature  combination 
of  Experiment  1 of  Table  33.  The  five-feature  combination 
(one  visual  and  four  infrared)  of  Experiment  1 of  Table 
33  [consisting  of  gray  level  difference  between  brightest 
and  darkest  points  in  the  visual  picture  for  a given 
sample  area,  standard  deviation  of  temperature,  coldest 
temperature,  temperature  difference  between  coldest  and 
warmest  temperatures,  and  temperature  difference  between 
the  coldest  10%  of  the  infrared  temperatures  and  the 
warmest  10%  of  the  infrared  temperatures]  was  used  as  a 
standard  feature  set  (see  Table  43  and  Experiments  1 of 
Tables  37-40)  for  comparison  of  various  tree  skeletons 
and  classifiers  because  of  its  uniform  ability  to 
accurately  separate  cloud  pattern  regardless  of  classifier 
and/or  tree  skeleton  design.  Experiment  3 of  Table  32 
shows  the  result  of  excluding  the  visual  brightness 
range.  Experiment  4 illustrates  the  fact  that  most  of 
the  information  contained  in  the  infrared  standard  devia- 
tion feature  can  also  be  found  in  the  infrared  range 
features  314  and  315.  The  necessity  of  identifying  the 
value  of  the  coldest  temperature  for  identification  of 
cirrus  and  cumulonimbus  is  shown  by  Experiment  5.  Ex- 
periments 6 and  7 and  Experiment  3 of  Table  33  illustrate 
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the  essential  redundancy  for  single-stage  classification 
of  including  both  the  infrared  range  features  from  0%  to 
100%  and  from  10%  to  90%.  Of  the  two  ranges,  the  range 
from  10%  to  90%  performed  slightly  better  in  combination 
with  other  features.  The  incorporation  of  more  than  one 
range  feature  in  feature  comb i nations  functioned  more  as 
a weighting  factor  than  as  an  additional  i nf orma ti on 
sou  rce . 

The  five-feature  standard  combination,  given  in  Ex- 
periment 1 of  Table  33,  resulted  in  86%  classification 
accuracy  for  single-stage  maximum  likelihood  classifica- 
tion. When  the  quadrant  standard  deviation  feature  was 
substituted  for  the  sample  standard  deviation  feature 
(Experiment  2,  Table  33),  total  classification  accuracy 
remained  the  same.  Experiments  3,  5,  and  3 vary  the  pro- 
portion of  visual  to  infrared  features  in  Experiments  2, 

4,  and  7 from  1 and  4 to  2 visual  and  3 infrared  features. 
In  each  of  Experiments  3,  5,  and  8,  classification  accuracy 
was  improved  over  the  corresponding  Experiments  2,  4 and 
7.  The  all-texture  five-feature  combinations  of  Experi- 
ments 4 and  5 resulted  in  78.6%  and  80.7%  accuracy  re- 
spectively, compared  to  the  no-texture  five-feature  com- 
binations of  Experiments2  and  3 with  36.0%  and  86.4% 
accuracy  respectively.  The  five- feature  combi  nations  of 
Experiments  7 and  8,  which  included  one  infrared  entropy 
feature,  performed  best  of  any  five-feature  combination, 
with  classification  accuracies  of  87.2%  and  38.1%  re- 


spectively.  Experiment  9 illustrated  that  even  with  the 
addition  of  an  infrared  entropy  feature,  if  no  information 
was  available  from  the  visual  picture,  classification  re- 
sults fell  below  80%  for  five-feature  combinations.  Ex- 
periment 6 illustrated  that,  for  the  five-feature  com- 
binations tried,  combinations  of  3 texture  and  2 non- 
texture features  performed  worse  than  all-texture  features, 
all  non-texture  features,  and  combinations  of  1 texture 
and  4 non-texture  features. 

The  experiments  of  Table  34  illustrated  that  even 
with  the  addition  of  several  potentially  good  discriminat- 
ing features  to  the  combinations  tried  in  the  experiments 
reported  in  Table  33,  no  increase  in  classification 
accuracy  was  achieved.  Experiment  2 of  Table  34  added 
the  infrared  range  feature  314  to  the  best  combination  in 
Table  33  (Experiment  8).  No  improvement  resulted.  Ex- 
periment 1 added  the  visual  range  feature  115  to  the 
standard  five-feature  combination  of  Experiment  1,  Table 
33.  A slight  improvement  in  classification  of  cirrus  and 
cumulonimbus  resulted  in  a change  in  total  classification 
accuracy  from  86.0%  to  87.2%.  Classification  results  for 
the  other  six-feature  combinations  of  Experiments  3-6, 

Table  34,  ranged  from  85.2%  to  86.4%.  With  a particular 
seven-feature  combination  (Experiment  2,  Table  35)  of  two 
infrared  entropy  texture  features,  two  visual  brightness 
features,  and  three  infrared  temperature  features, 
classification  accuracy  rose  to  89.7%.  Accuracy  greater 
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than  90%  was  achieved  only  by  changing  from  a single-level 
classifier  (figure  b)  to  a two-level  decision  tree  (Figure  6). 

The  experiments  in  Table  36  show  that  a 2%  increase 
in  classification  accuracy  resulted  from  separating  those 
samples  that  were  classified  as  low  cloud  on  the  first 
pass  of  the  maximum  likelihood  four-class  classifier  into 
low  and  mix  samples  by  using  on  a second  pass  quadrant 
features  combined  with  a maximum  likelihood  classifier 
trained  on  all  the  low  samples  and  all  the  mix  samples. 

Had  the  maximum  likelihood  classifier  for  the  second 
stage  of  the  decision  process  been  trained  only  on  mix 
samples  in  which  the  amount  of  low  cloud  predominated 
within  the  sample,  or  had  the  a priori  probabi 1 i ti es 
been  adjusted  to  reflect  the  uneven  proportion  of  low 
clouds  and  mix  clouds  arriving  at  the  second  stage  of  the 
decision  tree,  classification  results  would  probably 
have  improved.  Thus,  results  in  Table  36  represent  the 
minimal  amount  of  classification  accuracy  achievable  via 
this  two-stage  design  to  eliminate  confusion  between  low 
and  mix  samples. 

The  mix  and  low  samples  which  traveled  down  the 

left  branch  of  decision  tree  3 (see  Figure  7 ) were 

easily  separated  by  either  the  standard  feature  set  or 

the  quadrant  features,  as  can  be  seen  from  the  confusion 

matrices  for  Experiments  1-3  of  Table  37.  Also  there 

was  no  confusion  at  decision  node  2.2  between  the  cirrus 

and  cumulonimbus  samples  which  arrived  at  that  node  in 
I 
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either  of  Experiments  1,  2,  or  3.  However,  none  of  the 
feature  combinations  tried  for  dec i s i on  tree  3 coul d solve 
the  problem  that,  at  level  1,  several  mix  clouds  were 
classified  into  the  cirrus-cumulonimbus  group  and  also 
many  cirrus  and  cumulonimbus  clouds  were  classified  into 
the  low-mix  group. 

In  decision  tree  4 (Figure  8 ),  an  attempt  was  made 
to  separate  mix  clouds  from  the  others  at  the  top  of  the 
tree.  The  quadrant  features  which  were  designed  to 
separate  mix  from  low  clouds  were  notadequate  to  separate 
mix  from  the  combined  set  of  low,  cirrus,  and  cumulonimbus 
samples,  since  quadrant  features  such  as  maximum  standard 
deviation  of  temperature  are  high  for  cirrus  and  cumulo- 
nimbus as  well  as  for  mix.  The  total  classification 
accuracy  (Experiment  2,  Table  38)  was  only  76%  for  de- 
cision tree  4 with  quadrant  features  at  level  1 of  the 
classification  process.  Classification  accuracies  for 
Experiments  1 and  3,  Table  38,  in  which  no  quadrant 
features  were  used,  were  84.0%  and  81.5%  respectively, 
with  the  major  source  of  error  being  the  classification 
design  at  1 evel  1 . 

For  decision  tree  5 (Figure  9),  quadrant  features 
performed  better  at  level  1 (see  Experiment  3,  Table  39) 
for  separation  of  low  from  the  combined  set  of  mix, 
cirrus,  and  cumulonimbus  clouds.  The  percentage  of 
correctly  classified  samples  was  85.2%.  Non-quadrant 
features  performed  almost  as  well  (see  Experiments  1 and 


2,  Table  39)  at  level  1 with  resultant  total  classifica- 
tion accuracies  of  approximately  84%. 

The  multi-level  binary  tree  skeleton  which  offered 
consistently  superior  performance  for  various  feature 
combinations  was  tree  6 (Figure  10).  The  overlap  between 
low  and  mix  classes  was  approached  at  the  last  stage  of 
the  decision  process  and  the  problem  for  the  first  stage 
of  the  decision  process  was  the  relatively  easy  separation 
of  cumulonimbus  clouds  from  the  combined  set  of  mix,  low, 
and  cirrus  clouds.  The  first  stage  resulted  in  confusion 
only  between  mix  and  cumulonimbus  clouds  (see  the  con- 
fusion matrices  for  Experiments  1-3,  Table  40).  With 
quadrant  features  at  the  last  stage  of  the  decision  tree 
(Experiments  2 and  3),  classification  accuracies  were 
approximately  86% . 

The  problem  of  identification  of  mix  clouds  was  com- 
pletely disregarded  for  the  set  of  experiments  in  Table  41. 
Separation  of  the  three  remaining  cloud  types  with  a single- 
stage  maximum  likelihood  classifier  (Figure  11)  was 
achieved  by  several  two-feature  non-texture  combinations 
(one  visual  and  one  infrared)  with  98%  classification 
accuracy  (see  Experiments  3 and  4,  Table  41).  Infrared 
features  which  performed  well  were  temperature  ranges, 
temperature  standard  deviation,  and  coldest  temperature 
values.  Visual  features  which  performed  well  were 
highest  (brightest)  gray  level  value  and  ranges  of  visual 
bri ghtness  values. 


The  feature  dimensionality  for  the  three-class  prob- 
lem was  reduced  from  2 to  1 by  changing  from  the  single- 
stage  decision  process  of  Fioure  11  to  the  multi-stage  de- 
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cision  process  outlined  in  Figure  12.  The  three-class 
problem  was  resolved  into  separation  of  low  clouds  from 
cirrus  and  cumulonimbus  clouds  at  level  1 and  separation 
of  cirrus  and  cumulonimbus  clouds  at  level  2.  A total 
classification  accuracy  of  98.7%  (Experiment  1,  Table  42) 
was  achieved  by  using  feature  302,  standard  deviation  of 
temperature,  at  level  1 and  feature  113,  brightest  visible 
gray  level  value, at  level  2. 
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5 . Cone  1 us i ons  and  Plans  for  Further  Research 
5 . 1 Conclusions 

Features  characterizing  density  distributions  of 
infrared  gray  level  values,  visual  gray  level  values,  pairs 
of  infrared  gray  level  values  separated  by  a specified 
displacement  vector,  and  pairs  of  visual  gray  level  values 
separated  by  a specified  displacement  vector  can  success- 
fully discriminate  selected  categories  of  sample  areas  of 
tropical  cloud  patterns  presently  used  for  derivation  of 
wind  velocity  vectors.  Areas  which  can  be  differentiated 
are  areas  uniformly  covered  by  low  clouds,  areas  uniformly 
covered  by  cirrus  clouds,  and  areas  partially  or  com- 
pletely covered  by  cumulonimbus  clouds.  Either  single- 
stage  classifiers  with  at  least  two  appropriately 
selected  features,  one  from  the  infrared  temperature 
histogram  and  one  from  the  visible  brightness  histogram, 
or  a hierarchical  classification  system  (in  which  at  the 
first  stage  low  clouds  are  separated  from  cirrus  and 
cumulonimbus  by  one  or  more  infrared  features  such  as 
standard  deviation  of  temperature,  and  at  the  second 
stage  cirrus  is  separated  from  cumulonimbus  by  one  or  more 
visual  features  such  as  the  brightest  visual  gray  level 
value)  can  be  used.  The  hierarchical  system  is  to  be 
preferred  because  of  the  reduction  achievable  in 
feature  dimensionality  and  because  of  computational 
efficiency. 
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sample  area  of  cloud  patterns  cannot  successfully  dis- 
tinguish "mixed"  areas  which  are  either  partially  covered 
by  cirrus  with  1 ower  clouds  or  partially  covered  by  cirrus 
clouds  and  partially  covered  by  low  clouds  from  the  three 
categories  mentioned  above.  For  both  single-stage  and 
multistage  classification  systems,  a minimum  of  five 
features  was  needed  for  classification  accuracy  of  88%  on 
the  training  set.  Accuracy  could  only  be  improved  by 
the  addition  of  quadrant  features  (features  which  compared 
different  quadrants  of  the  sample  area  instead  of 
features  based  on  frequency  distributions)  applied  at  a 
second  stage  to  decrease  the  confusion  between  low  and 
mix  samples.  Binary  hierarchical  classification  systems 
failed  to  improve  the  classification  accuracy  or  to  re- 
duce the  feature  dimensionality  beyond  that  achievable 
by  maximum  likelihood  sing’e-stage  classification  for  the 
four-class  problem.  However,  if  an  approach  based  on 
the  application  of  image  segmentation  techniques  could  be 
developed  which  could  separate  "mixed"  clouds  from  the 
other  three  categories  at  the  first  stage  of  the  system, 
then  the  second  and  third  stages  could  separate  low 
clouds  and  cirrus  from  cumulonimbus  respectively  as 
above.  This  type  of  (as  yet  unrealized)  hierarchical 
system  would  definitely  be  preferable  to  a conventional 
single-stage  system. 


5.2  Plans  for  Further  Research 


The  next  problem  which  will  be  investigated  is  the 


application  of  image  segmentation  and  scene  analysis  tech- 
niques to  the  problem  of  obtaining  a meaningful  descrip- 
tion (relevant  to  the  problem  of  wind  velocity  estimation) 
of  "mixed"  cloud  areas.  The  description  should  result  in 
the  delineation  of  areas  or  points  from  which  more  than 
one  wind  velocity  vector  can  be  derived,  and  in  the  de- 
rivation of  one  or  more  numerical  features  which  can  be 
used  to  identify  "mixed"  cloud  areas.  The  scene  analysis 
problem  will  be  approached  at  three  successive  levels  de- 
pending on  whether  or  not  a successful  solution  to  the 
problem  is  attained  on  a lower  level.  First,  the  scene 
analysis  approach  will  investigate  the  problem  using  in- 
formation available  from  a single  pair  of  visual  and 
infrared  image  sample  areas.  If  sufficient  information 
is  not  available  from  analysis  of  only  one  image  pair  (in 
the  time  sequence  of  pairs  of  images  used  by  meteorolo- 
gists as  an  aid  to  labelling  of  "mixed"  cloud  areas),  then 
the  analysis  will  be  extended  to  make  use  of  two  pairs 
of  infrared  and  visual  images  separated  in  time  by  approxi- 
mately half  an  hour.  If  problems  are  still  encountered, 
the  regular  partitioning  of  sample  areas  will  be  abandon- 
ed, and  context  information  from  neighboring  samples  will 
be  added. 
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Figure  3.  Cloud  Category  Map  Prepared  by  Meteorologists  for 
32x32  Matrices.  For  explanation  of  entries 
( 1 , ,8 ) see  text . 


Figure  4.  Cloud  Category  Map  for  Sample  64x64  Data  Regio 
for  Categories  L (low  clouds),  M (mixed  clouds 
Ci  (cirrus  clouds),  and  Cb  (cumulonimbus  cloud 


MIX  CIRRUS  CUMULONIMBUS 

Figure  5.  Decision  Tree  1. 


MIX  CIRRUS  CUMULONIMBUS 

MIX 


Figure  6.  Decision  Tree  2. 


-OW  CIRRUS  CUMULONIMBUS 

Figure  7 . Decision  Tree  3. 


Figure  8.  Decision  Tree  4. 


CIRRUS 


CUMULONIMBUS 


Figure  9.  Decision  Tree  5. 


LOW  MIX 


Figure  10.  Decision  Tree  6. 


JW  CIRRUS  CUMULONIMBUS 

Figure  11.  Decision  Tree  7. 


LOW 


Figure  12.  Decision  Tree  8. 


Classification  Goals 


Relation  to  Wind  Extraction 
Programs 


1.  Identify  areas  containing 
cumulonimbus  clouds. 


2.  Identify  areas  containing 
predominantly  single-layer, 
1 ow-1 evel  cl ouds  . 


3.  Identify  areas  containing 
predominantly  high-level 
clouds . 


4.  Identify  areas  containing 
predominantly  multi-layered 
clouds. 


1.  Reject  areas  containing 
cumulonimbus  clouds  from 
wind  extraction  programs 
since  movement  of  cumulo- 
nimbus clouds  does  not, in 
general,  correspond  to 
movement  of  horizontal 
wind. 


2.  Postulate  for  wind  ex- 
traction program  that 
cloud  tracers  located  in 
these  areas  are  moving  in 
same  direction  and  speed. 
Postulate  that  emissivity 
of  cloud  tracers  is  unity 


3.  Check  whether  or  not  major 
cloud  tracers  located  in 
these  areas  are  moving  in 
same  direction  and  speed. 
Choose  only  those  tracers 
sufficiently  opaque  to 
appear  white  in  both  visible 
and  infrared  images. 
Postulate  emissivity  values 
between  .75  and  unity. 


4.  Postulate  that  cloud  tracers 
may  be  moving  at  different 
speeds  and  in  different 
directions.  Identify  low 
cloud  tracers  and  high  cloud 
tracers,  specifying  emissivi- 
ty values  for  high  tracers 
depending  on  opacity. 


Table  1.  Relation  of  Classification  Goals  to 
Automatic  Wind  Velocity  Extraction. 


TABLES  2-17 


Feature  Statistics 


Table 


Hi stogram 


type 


2 

3 

4 

5 

6 

7 

8 
9 

10 
1 1 
12 

13 

14 
1 5 
16 
17 


Visual  brightness 
Visual  brightness 
Visual  brightness 
Visual  brightness 
Infrared  temperature 
Infrared  temperature 
Infrared  Temperature 
Infrared  temperature 
Visual  difference 
Visual  difference 
Visual  difference 
Visual  difference 
Infrared  difference 
Infrared  difference 
Infrared  difference 
Infrared  difference 
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TABLES  18-27 


Fisher 

Distances 

Table 

Histogram  type 

Di stance 

18 

Visual 

bri ghtness 

- 

19 

Infrared  temperature 

- 

20 

Visual 

difference 

1 

21 

Visual 

di f ference 

2 

22 

Visual 

di f ference 

4 

23 

Visual 

di f ference 

8 

24 

Infrared  difference 

1 

25 

Infrared  difference 

2 

26 

Infrared  difference 

4 

27 

Infrared  difference 

8 
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CLOUD  PATTERN  CLASSIFICATION  FROM  VISIBLE  AND  INFRARED  DATA* (U) 

FEB  76  J PARIKH  F44620-72-C-0062 

UNCLASSIFIED  TR-442  AFOSR-TR-76-1137  NL 


EXPERIMENT 

NUMBER 


FEATURE  SELECTION 
Level  Feature  (Number,  Name) 


(108,  CF50) 
(113,  CF100) 
(114,  R0-100) 
(115,  R10-90) 
(116,  R0-50 ) 
(117,  R50-100) 
(118,  R20-80) 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 


Low 

Mix 

Ci 

Cb 

Tota 

55.8 

57.5 

0.0 

47.8 

49.4 

39.5 

64.4 

0.0 

84.8 

53.1 

41  .9 

66.7 

0.0 

73.9 

52.7 

62.8 

54.0 

0.0 

58.7 

52.7 

72.1 

37.9 

0.0 

50.0 

48.6 

40.7 

71.3 

0.0 

37.0 

46.9 

68.6 

50.6 

0.0 

50.0 

51  .9 

DECISION  LOGIC 

STRUCTURE:  Tree  1 

LEVEL  1: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 


Table  28.  Maximum  Likelihood  Single-Level  Classification 

for  Single  Features  Extracted  from  Visual  Bright- 
ness Histograms. 


EXPERIMENT 


FEATURE  SELECTION 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 


NUMBER 

Level 

Feature  (Number,  Name) 

Low 

Mix 

Ci 

Cb 

Total 

1 

1 

(121,  MeanHor) 

68.6 

31.0 

0.0 

41.3 

43.2 

2 

1 

(122,  MeanVer) 

45.4 

55.2 

0.0 

47.8 

44.9 

3 

1 

(123,  MeanlD) 

53.5 

49.4 

0.0 

47.8 

45.7 

4 

1 

(124,  Mean2D) 

43.0 

58.6 

0.0 

54.4 

46.5 

5 

1 

(139,  ASMHor ) 

37.2 

43.7 

29.2 

41.3 

39.5 

6 

1 

(140,  ASMVer ) 

38.4 

46.0 

4.2 

78.3 

45.3 

7 

1 

(141  , ASM1D) 

39.5 

47.1 

4.2 

80.4 

46.5 

8 

1 

(142,  ASM2D) 

38.4 

44.8 

4.2 

76.1 

44.4 

9 

1 

(148,  EntHor) 

51.2 

46.0 

0.0 

45.7 

43.2 

10 

1 

(149,  EntVer) 

36.1 

59.8 

0.0 

63.0 

46.1 

11 

1 

(150,  EntlD) 

39.5 

54.0 

0.0 

63.0 

45.3 

12 

1 

(151,  Ent2D) 

37.2 

59.8 

0.0 

63.0 

46.5 

DECISION  LOGIC 

STRUCTURE:  Tree  1 

LEVEL  1: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 


Table  29.  Maximum  Likelihood  Single-Level  Classification  for  Single 
Features  Extracted  from  Visual  Difference  Histograms. 


EXPERIMENT 


FEATURE  SELECTION 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 


I 


' 


NUMBER 

Level 

Feature  (Number,  Name) 

Low 

Mix 

Ci 

Cb 

Tota 

1 

1 

(302,  StDev ) 

95.4 

72.4 

0.0 

58.7 

70.8 

2 

1 

(303,  CFO) 

95.4 

71.3 

0.0 

73.9 

73.3 

3 

1 

(308,  CF50) 

87.2 

41  .4 

12.5 

54.4 

57.2 

4 

1 

(314,  R0-100) 

95.4 

70.1 

0.0 

76.1 

73.3 

5 

1 

(315,  R10-90) 

97.7 

66.7 

0.0 

58.7 

69.6 

6 

1 

(316,  R0-50 ) 

96.5 

77.0 

0.0 

45.7 

70.4 

7 

1 

(317,  R50-100) 

89.5 

51.7 

0.0 

52.2 

60.1 

8 

1 

(318,  R20-80 ) 

94.2 

62.1 

0.0 

50.0 

65.0 

9 

1 

(465,  MaxR10-90) 

95.4 

69.0 

0.0 

56.5 

69.1 

10 

1 

(466,  RanR10-90) 

95.4 

65.5 

0.0 

28.3 

62.6 

11 

1 

(467,  MinCFO) 

95.4 

71.3 

0.0 

73.9 

73.3 

12 

1 

(468,  RanCFO) 

95.4 

70.1 

0.0 

30.4 

64.6 

13 

1 

(469,  MaxStDev) 

95.4 

72.4 

0.0 

58.7 

70.8 

14 

1 

(470,  RanStDev) 

95.4 

65.5 

0.0 

32.6 

63.4 

I . 


DECISION  LOGIC 

STRUCTURE:  Tree  1 

LEVEL  1: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 


Table  30.  Maximum  Likelihood  Single-Level  Classification  for 
Single  Features  Extracted  from  Infrared  Temperature 
Histograms . 


■iwm 


EXPERIMENT 


FEATURE  SELECTION 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 


NUMBER 

Level 

Feature  (Number,  Name) 

Low 

Mix 

Cl 

Cb 

Total 

1 

1 

(321,  MeanHor) 

96.5 

56.3 

0.0 

52.2 

64.2 

2 

1 

(322,  MeanVer) 

94.2 

59.8 

0.0 

56.5 

65.4 

3 

1 

(323,  MeanlD) 

95.4 

59.8 

0.0 

56.5 

65.8 

4 

1 

(324,  Mean2D) 

93.0 

58.6 

0.0 

54.4 

64.2 

5 

1 

(339,  ASMHor ) 

94.2 

46.0 

0.0 

63.0 

61.7 

6 

1 

(340,  ASMVer ) 

87.2 

47.1 

0.0 

78.3 

62.6 

7 

1 

(341,  ASM1D) 

89.5 

49.4 

0.0 

80.4 

64.2 

8 

1 

(342,  ASM2D) 

87.2 

47.1 

0.0 

80.4 

63.0 

9 

1 

(348,  EntHor) 

96.5 

54.0 

0.0 

60.9 

65.0 

10 

1 

(349,  EntVer) 

93.0 

59.8 

0.0 

73.9 

68.3 

11 

1 

(350,  EntlD) 

93.0 

59.8 

0.0 

73.9 

68.3 

12 

1 

(351,  Ent2D ) 

91.9 

59.* 

0.0 

73.9 

67.9 

DECISION  LOGIC 

STRUCTURE:  Tree  1 

LEVEL  1: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 


Table  31.  Maximum  Likelihood  Single-Level  Classification  for 
Single  Features  Extracted  from  Infrared  Difference 
Histograms. 


V. 

4.. 


PERCENTAGE  OF  SAMPLES 

EXPERIMENT  FEATURE  SELECTION  CORRECTLY  CLASSIFIED 


NUMBER 

Level  Features  (Number,  Name) 

Low 

Mix 

Ci 

Cb 

TOTAL 

| • 1 

1 (114,  R0-100), 

(315,  R10-90) 

(314,  R0-100) 

94.2 

77.0 

70.8 

84.8 

84.0 

C\J 

a 

1 (114,  R0-100), 

(350,  EntlD) 

(314,  R0-100) 

95.4 

74.7 

75.0 

84.8 

84.0 

3 

1 (302,  StDev) , 

(314,  R0-100), 

(303,  CFO), 
(315,  R10-90), 

95.4 

74.7 

54.2 

71.7 

79.4 

4 

1 (114,  R0-100), 

(314,  R0-100), 

(303,  CFO), 
(315,  R10-90) 

94.2 

80.5 

75.0 

87.0 

86.0 

5 

1 (114,  R0-100), 

(314,  R0-100), 

(302,  StDev), 
(315,  R10-90) 

95.4 

79.3 

66.7 

84.8 

84.8 

6 

1 (114,  R0-100), 

(303,  CFO), 

(302,  StDev) 
(315,  R10-90) 

96.5 

79.3 

75.0 

87.0 

86.4 

7 

1 (114,  R0-100), 

(303,  CFO), 

(302,  StDev), 
(314,  R0-100) 

94.2 

79.3 

75.0 

87.0 

85.6 

DECISION  LOGIC 

STRUCTURE:  Tree  1 

LEVEL  1: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 


Table  32 


Maximum  Likelihood  Single-Level  Classification 
for  Selected  Combinations  of  Three  and  Four 
Features. 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 
Low  Mix  Ci  Cb  Total 


FEATURE  SELECTION 
Level  Features  (Number,  Name) 


EXPERIMENT 

NUMBER 


302  , StDev) , 
314,  R0-100) 


RO-lOO) , 
RO-lOO), 
MaxStDev) 


113,  CF100), 
303,  CFO), 
315,  R10-90) 


ASM2D) 
ASM1D) 
E n 1 2 D ) 


(141.ASM1D),  (151,  Ent2D) , 98.8  65.5  62.5  84.880.7 

(324,  Mean2D),  (342,  ASM2D) , 

(351,  Ent2D) 


302,  StDev), 
314,  RO-lOO) 
350,  EntID) 


DECISION  LOGIC 


STRUCTURE:  Tree  1 


LEVEL  1: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 


Maximum  Likelihood  Single-Level  Classlflcatl 
for  Selected  Combinations  of  Five  Features. 


EXPERIMENT 

NUMBER 


Level 

1 


FEATURE  SELECTION 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 


Features  (Number,  Name) 

Low 

Mix 

Cl 

Cb 

Tota 

(114, 

(302, 

(314, 

R0-100), 
StDev) , 
R0-100) , 

(115,  R10-90), 
(303,  CFO), 
(315,  R10-90) 

94.2 

81.6 

79.2 

89.1 

87.2 

(113, 

(303, 

(315, 

R0-100), 
CFO), 
R10-90) , 

(114,  R0-100), 
(314,  R0-100), 
(350,  EntlD) 

97.7 

79.3 

75.0 

93.5 

88.1 

(114, 
(303  , 
(315, 

R0-100), 

CFO), 

R10-90), 

(302,  StDev), 
(314,  R0-100), 
(35D,  EntlD) 

95.4 

81.6 

66.7 

89.1 

86.4 

(114, 

(303, 

(315, 

R0-100), 

CFO), 

R 1 0 - 9 0 ) , 

(302,  StDev) 
(314,  R0-100), 
(324,  Mean2D) 

94.2 

81.6 

66.7 

84.8 

85.2 

(114, 

(303, 

(315, 

R0-100) , 
CFO), 
R10-90) , 

(302,  StDev), 
(314,  R0-100), 
(317,  R50-100) 

95.4 

80.5 

79.2 

84.8 

86.4 

(114, 

(303, 

(315, 

R0-100) , 
CFO), 
R10-90) , 

(302,  StDev), 
(314,  R0-100), 
(469,  MaxStDev) 

94.2 

81.6 

70.8 

84.8 

85.6 

DECISION  LOGIC 

STRUCTURE:  Tree  1 

LEVEL  1 : 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 


Table  34.  Maximum  Likelihood  Single-Level  Classification 
for  Selected  Combinations  of  Six  Features. 


■HMMWMMM 


EXPERIMENT  FEATURE  SELECTION 

NUMBER  Level  Features  (Number,  Name) 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 
Low  Mix  Ci  Cb  Total 


1 

1 l 

1 

1 

1 

[114,  R0-100), 
(302,  StDev) , 
314,  R0-100), 
(350,  EntlD) 

(115,  R10-90), 
(303,  CFO), 
(315,  R10-90), 

94.2 

83.9 

79.2 

93.5 

88.9 

2 

1 

1 

( 

i 

(113,  CFO), 

[ 303 , CFO), 
[315,  R10-90), 
(351  , Ent2D ) 

(114,  R0-100), 
(314,  R0-100), 
(350,  EntlD),  , 

97.7 

82.8 

79.2 

93.5 

89.7 

3 

1 ( 
( 

! 

i 1 1 4 , R0-100), 
302,  StDev), 
|314,  R0-100), 
317,  R50-100) 

(115,  R10-90), 
(303,  CFO), 
(315,  R10-90) , 

94.2 

82.8 

79.2 

87.0 

87.2 

DECISION  LOGIC 

STRUCTURE:  Tree  1 

LEVEL  1: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 


■ 


i 


EXPERIMENT 

NUMBER 


FEATURE  SELECTION 
Level  Features  (Number,  Name) 


(114, 

R0-100) , 

(115, 

R10-90) , 

(302, 

StDev ) , 

(303, 

CFO), 

(314, 

R0-100) , 

(315, 

R10-90) , 

(350, 

EntlD) 

t 

(302, 

StDev) , 

(465, 

MaxRl 0-90 ) 

(466, 

RanRl  0-90) 

,(467, 

MinCFO) , 

(468, 

RanCFO ) , 

(469, 

MaxStDev) 

(114, 

R0-100) , 

(115, 

R10-90) , 

(302, 

StDev) , 

(303, 

CFO), 

314, 

R0-100) , 

(315, 

R10-90) , 

(350, 

Entlu) 

(303, 

R0-100) 

(315, 

R10-90) , 

(317, 

R50-100) , 

(467, 

MinCFO) , 

(468, 

RanCFO) , 

(469, 

MaxStDev) 

(114, 

R0-100) , 

(115, 

R10-90) , 

(302, 

StDev) , 

(303, 

CFO), 

314, 

R0-100) , 

(315, 

R10-90), 

(350, 

EntlD) 

(114, 

R0-100) , 

(115, 

R10-90) , 

(302, 

StDev) , 

(303, 

CFO), 

(314, 

R0-100) , 

(315, 

R 1 0-90) , 

(317, 

R50-100) 

(114, 

R0-100) , 

(302, 

StDev ) , 

(303, 

CFO), 

(314, 

R0-100) , 

(315, 

R10-90 ) 

(302, 

StDev ) , 

(465, 

MaxRl 0-90 ) 

(466, 

RanRl 0-90) 

,(467, 

MinCFO) , 

(468, 

RanCFO) , 

(469, 

MaxStDev) 

PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 
Low  Mix  Cl  Cb  Tota 

94.2  90.8  79.2  93.5  91.4 


94.2 


79.2 


91  .0 


79.2  93.5 


87.  0 86.  5 


DECISION  LOGIC 

STRUCTURE:  TREE  2 

LEVEL  1: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix,  Cirrus,  Cumulonimbus 

LEVEL  2: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Mix 


Table  36.  Maximum  Likelihood  Two-Level  Classification 
Designed  to  Reduce  Confusion  of  Mix  With 
Low  Samples. 


EXPERIMENT  FEATURE  SELECTION 

NUMBER  Level  Features  (Number,  Name) 


R0-100), 
CFO), 
R10-90) 
R0-100) , 
CFO), 
R10-90) 
RO-lOO) , 
CFO)  , 
R10-90) 


(302, 

(314, 

(302  , 
(314, 

(302  , 
(314, 


StDev) , 
RO-lOO) 

StDev) , 
RO-lOO) 

StDev) , 
RO-lOO) 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 
Low  Mix  Ci  Cb  Total 

94.2  69.0  75.0  82.6  81.1 


(113,  CF100),  (114,  RO-lOO),  94.2 

(314,  RO-lOO),  (315,  R10-90), 

(350,  EntlD) 

(302,  StDev),  (465,  MaxR10-90), 
(466,  RanRl  0-90 ),( 467  , MinCFO), 

(468,  RanCFO ) , (469  , MaxStDev) 

(113,  CF100),  (114,  RO-lOO) 


64.4  62.5  87.0  79.0 


(114,  RO-lOO) 
(302,  StDev), 
(314,  RO-lOO) 
(351,  Ent2D ) 
(302,  StDev), 


(115,  R10-90),  94, 
(303,  CFO), 

(315,  R10-90), 

(465,  MaxR10-90 ), 


75.0  87.0  82.7 


(466,  RanRl  0-90 ),( 467  , MinCFO), 
(468,  RanCFO),  (469,  MaxStDev) 
(113,  CF100) 


DECISION  LOGIC 


STRUCTURE:  Tree  3 

LEVEL  1: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Set  of  Mix  and  Low  Samples,  Set  of  Cirrus  and 

Cumulonimbus  Samples 

LEVEL  2.1: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Mix,  Low 

LEVEL  2.2: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Cirrus,  Cumulonimbus 


Table  37.  Maximum  Likelihood  Two-Level  Classification  Grouping 
Mix  and  Low  Samples  into  One  Class  and  Cirrus  and 
Cumulonimbus  Samples  into  a Second  Class. 


EXPERIMENT 

NUMBER 

1 


FEATURE  SELECTION 
Level  Features  (Number,  Name) 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 


1 

(114, 

R0-100),  (302, 

StDev ) , 

(303, 

CFO),  1 

(314, 

R0-100) 

(315, 

R10-90) 

2 

(114, 

R0-100),  (302, 

StDev ) , 

(303, 

CFO),  (314, 

R0-100) 

(315, 

R 1 0 - 90 ) 

3 

(114, 

R0-100),  (302, 

StDev) , 

(303, 

CFO),  (314, 

RO-lOO) 

(315, 

R10-90) 

1 

(302, 

StDev),  1 

[465, 

MaxRl 0- 

(466, 

RanRl  0-90 ),! 

[467  , 

MinCFO) 

(468, 

RanCFO ) , 1 

[469, 

MaxStDe 

2 

(302  , 

StDev ) , i 

(314, 

R0-100) 

3 

(113, 

CF100) , 1 

[114, 

R0-100) 

1 

(114, 

R0-100),  (115, 

R 1 0 - 9 0 ) 

(302, 

StDev) , ( 303 , 

CFO), 

(314, 

R0-100),  (315, 

R 1 0 - 90 ) 

(350, 

EntlD) 

2 

(302, 

StDev)  , i 

(314, 

R0-100) 

3 

(113, 

CF100),  (114, 

R0-100) 

Low 

95.4 


Mix 

73.6 


Ci 

66.7 


Cb 

91.3 


Total 

84.0 


58.6  66.7  80.4  76.1 


76.7  83.9  66.7  93.5  81.5 


DECISION  LOGIC 

STRUCTURE:  Tree  4 

LEVEL  1: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Mix,  Set  of  Cirrus,  Cumulonimbus,  and  Low 

Samples 

LEVEL  2: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Set  of  Cirrus  and  Cumulonimbus  Samples 

LEVEL  3: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Cirrus,  Cumulonimbus 


Table  38.  Maximum  Likelihood  Three-Level  Cl  ass i f 1 ccati on 
Designed  to  Separate  Mix  Samples  at  Level  1 of 
the  Decision  Tree. 


L 


mm 


EXPERIMENT  FEATURE  SELECTION 

NUMBER  Level  Features  ^Number,  Name) 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 
Low  Mix  Ci  Cb  Total 


(114, 

(303, 

(315, 

(114, 

(303, 

(315, 

(114, 

(303, 

(315, 

(113, 
(302  , 
(314, 
(351  , 
(113, 
(302  , 
(314, 
(351  , 
(113, 


R0-100)  , 

CFO), 

R10-90) 

R0-100) , 

CFO), 

R10-90) 

R0-100) , 

CFO), 

R10-90) 

CF100)  , 
StDev ) , 
R0-100)  , 
Ent2D ) 
CF100)  , 
StDev ) , 
R0-100) , 
EntlD) 
CF100)  , 


(302,  StDev),  96.5  77.0  66.7  82.6  84.0 

(314,  R0-100), 

(302,  StDev), 

(314,  R0-100), 

(302,  StDev), 

(314,  R0-100), 


(114,  R0-100),  96.5  75.9  70.8  84.8  84.4 

(303,  CFO), 

(315,  R10-90), 


(114,  R0-100), 
(303,  CFO), 
(315,  R10-90), 

(114,  R0-100) 


(302  , StDev),  (465,  MaxRl  0-90),  94 . 2 80.5  66.7  87.0  85.2 

(466,  RanR10-90)  ,(467  , MinCFO), 


(468,  RanCFO ) 
(114,  R0-100) 
(302,  StDev), 
(314,  R0-100) 
(351,  EntlD) 
(113,  CF100), 


(469,  MaxStDev) 
(115,  R10-90), 
(303,  CFO), 
(315,  R10-90), 

(114,  R0-100) 


DECISION  LOGIC 

STRUCTURE:  Tree  5 

LEVEL  1: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Set  of  Mix,  Cirrus,  and  Cumulonimbus  Samples 

LEVEL  2: 

CLASSIFIER:  Maximum  Li kel i hood 

TRAINING  SETS:  Mix,  Set  of  Cirrus  and  Cumulonimbus  Samples 

LEVEL  3: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Cirrus,  Cumulonimbus 

Table  39.  Maximum  Likelihood  Three-Level  Classification  Designed 

to  Separate  Mix  Samples  at  Level  2 of  the  Decision  Tree. 


EXPERIMENT  FEATURE  SELECTION 

NUMBER  Level  Features  (Number,  Name) 


PERCENTAGE  OF  SAMPLES 
CORRECTLY  CLASSIFIED 
Low  Mix  Ci  Cb  Total 


(114, 
(303, 
(315, 
(114, 
(303, 
(315, 
(114, 
(303  , 
(315, 


R0-100)  , 
CFO), 
R10-90) 
R0-100)  , 
CFO), 

R 1 0 - 9 0 ) 
R0-100)  , 
CFO)  , 
R10-90) 


(302,  StDev), 
(314,  R0-100), 

(302,  StDev), 
(314,  R0-100), 

(302,  StDev), 
(314,  R0-100), 


94.2  75.9  75.0  89.1  34.8 


(114,  R0-100),  (302,  StDev), 

(303,  CFO),  (314,  R0-100), 

(315,  R10-90) 

(114,  R0-100),  (302,  StDev), 

(303,  CFO),  (314,  R0-100), 

(315,  R10-90) 

(302  , StDev)  , (465  , MaxR10-9 

(466  , RanR1  0-90)  ,(467  , MinCFO), 


94.2  80.5  75.0  89.1  36.4 


(465  , MaxRl 0-90 ) , 


(468,  RanCFO  ) , (469  , MaxStDev) 


(114,  R0-100) 
(303,  CFO), 
(315,  R10-90) 
(114,  R0-100) 
(303,  CFO), 
(350,  EntlD) 
(302,  StDev), 


(302,  StDev), 
(314,  R0-100), 

(142,  ASM2D ) , 
(314,  RO-lOO), 


(465  , MaxRl 0-90 ) , 


94.2  81 .6  75.0  39.1  86.8 


(466,  RanR10-90)  , (467  , MinCFO), 
(468,  RanCFO),  (469,  MaxStDev) 


DECISION  LOGIC 

STRUCTURE:  Tree  6 


LEVEL  1: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Cumulonimbus,  Set  of  Low,  Mix,  and  Cirrus  Samples 

LEVEL  2: 


CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Cirrus,  Set  of  Low  and  Mix  Samples 

LEVEL  3: 

CLASSIFIER:  Maimum  Likelihood 

TRAINING  SETS:  Low,  Mix 


Table  40.  Maximum  Likelihood  Three-Level  Classification  Designed 

to  Separate  Mix  Samples  at  Level  3 of  the  Decision  Tree. 


r 


, 

; 


I 


PERCENTAGE  OF  SAMPLES 

EXPERIMENT  FEATURE  SELECTION  CORRECTLY  CLASSIFIED 


NUMBER 

Level 

Features  (Number,  Name) 

Low 

Ci 

Cb 

Total 

1 

1 

(114,  R0-100) 

89.5 

0.0 

87.0 

75.0 

2 

1 

(314,  R0-100) 

98.8 

50.0 

89.1 

88.5 

3 

1 

(114,  R0-100), 

(314,  R0-100) 

98.8 

95.8 

100.0 

98.7 

4 

1 

(113,  CF100), 

(302,  StDev ) 

100.0 

91.7 

100.0 

98.7 

5 

1 

(114,  R0-100), 

(302,  StDev), 

97.7 

95.8 

100.0 

98.1 

(303,  CFO), 

(314,  R0-100) 

» 

(315,  R10-90) 


DECISION  LOGIC 

STRUCTURE:  Tree  7 

LEVEL  1: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Cirrus,  Cumulonimbus 


Table  41.  Maximum  Likelihood  Single-Level  Classifica- 
tion for  the  Three-Class  Problem. 


PERCENTAGE  OF  SAMPLES 

EXPERIMENT  FEATURE  SELECTION  CORRECTLY  CLASSIFIED 

NUMBER  Level  Features  (Number,  Name)  Low  Ci  Cb  Total 


1 

1 

(302.  StDev ) 

100.0 

91.7 

100.0 

98.7 

2 

(113,  CF100) 

2 

1 

(314,  R0-100) 

98.8 

95.8 

97.8 

98.1 

2 

(114,  R0-100) 

3 

1 

(ju2,  StDev) 

(314,  R0-100) 

98.8 

95.8 

100.0 

98.7 

2 

(113,  CF100) , 

(114,  R0-100) 

4 

1 

(114,  R0-100), 

(302,  StDev), 

97.7 

95.8 

100.0 

98.1 

(303,  CFO), 

(314,  R0-100), 

(315,  R10-90) 

2 

(114,  R0-100), 

(302,  StDev), 

(303,  CFO) , 

(314,  R0-100) 

(315,  R 1 0 - 9 0 ) 

DECISION  LOGIC 

STRUCTURE:  Tree  8 

LEVEL  1: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Low,  Set  of  Cirrus  and  Cumulonimbus  Samples 

LEVEL  2: 

CLASSIFIER:  Maximum  Likelihood 

TRAINING  SETS:  Cirrus,  Cumulonimbus 


Table  42  . Maximum  Likelihood  Two-Level  Classification 
for  the  Three  Class  Problem. 
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DEFINITIONS  OF  HISTOGRAMS  CALCULATED  FOR  EACH  SAMPLE  AREA 


Histograms  of  Reflectivity  and  Radiation  Measurements 

a)  Visual  Brightness  Histogram  (VB) 

The  visual  brightness  histogram  represents  the  fre- 
quency distribution  of  reflected  solar  radiation 
measurements  from  the  visual  (.5  - .7ym)  channel, 
coded  in  a 0-255  value  range. 

b)  Infrared  Temperature  Histogram  (IT) 

The  infrared  temperature  histogram  represents  the 
frequency  distribution  of  long-wave  radiation  measure- 
ments from  the  infrared  (10.5  - 12.5ym)  channel  con- 
verted to  temperature  values  in  the  160°  Kelvin  - 
330°  Kelvin  range  (assuming  emissivity  values  of  1.0) 
and  then  shifted  by  -160°  to  a range  of  0 - 170. 
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Histograms  of  Visual  and  Infrared  Difference  Measurements 

a ) Visual  Difference  Histogram  for  Direction  "6"  and 
Distance  "p"  (VD9p) 

A visual  difference  histogram  for  direction  "6"  and 
distance  "p"  represents  the  frequency  distribution  of 
absolute  values  of  differences  between  pairs  of  bright- 
ness measurements  from  the  visual  observation  array 
separated  by  "p"  steps  along  an  array  line  in  the 
direction  "e".  Directional  values  "0"  range  over  the 
set  (Hor,  Ver,  ID,  20}  where  Hor  (or  "horizontal")  de- 
notes the  East-West  direction,  Ver  (or  "vertical")  the 
North-South  direction,  ID  (or  "first  diagonal")  the 
Northwest-Southeast  direction,  and  2D  (or  "second 
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diagonal")  the  Northeast-Southwest  direction.  Dis- 
tance values  "p"  range  over  the  set  {1,  2,  4,  8}  with 
a distance  of  "1"  specifying  adjacent  pairs  of  measure- 
ments, a distance  of  "2"  specifying  pairs  of 
measurements  separated  by  a single  measurement,  etc. 

A total  of  16  visual  difference  histograms  were  cal- 
culated, for  each  of  the  4x4  (direction,  distance) 
pairs. 

b ) Infrared  Difference  Histograms  for  Direction  "6"  and 
Distance  "p"  ( I D 0 p 1 

An  infrared  difference  histogram  for  direction  "0" 
and  distance  "p"  represents  the  frequency  distribution 
of  absolute  values  of  differences  between  pairs  of  re- 
scaled temperature  measurements  from  the  infrared 
observation  array  separated  by  "p"  steps  along  a line 
in  the  direction  "e".  A total  of  16  infrared  differ- 
ence histograms  were  calculated,  just  as  for  the 
visual  differences. 

Difference  histograms  for  the  simple  illustrative  examples  of 
Figure  A. 1 are  given  in  Table  A. 1.  The  difference  histograms 
were  used  to  derive  texture  features,  as  described  in  Section 
3.1.  The  feature  definitions  are  presented  in  Table  A. 2,  and 
the  feature  numbering  used  is  defined  in  Tables  A. 3-4. 
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Figure  A.l 


Simplified  Visual  and  Infrared 
Observation  Matrices  Used  to 
Illustrate  the  Calculation  of 
Sample  Histograms. 


Visual  Histograms 


Infrared  Histograms 
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*VDHorl  denotes  a visual  difference  (VD)  histogram  for  the  "horizontal" 
direction  and  a distance  of  "1".  IDHorl  denotes  an  infrared  difference 
(ID)  histogram  for  "horizontal"  direction  and  a distance  of  "1".  The 
other  histogram  names  are  interpreted  similarly. 


Table  A. 1.  Sample  Histograms  Calculated  from  Simplified 
Visual  and  Infrared  Observation  Matrices  of 
Figure  A .1. 
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Table  A. 2.  FEATURE  DEF  I N IT  I ONS ( CONT I NUED ) 


Table  A. 2.  FEATURE  DEFINITIONS (CONTINUED) 
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Feature  Number 

Feature  Histogram (s) 

Feature  Name 
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MeanM 

126 

VDHorl,  VDVerl,  VDlDl,  VD2D1 

MeanS 

127 

VDHorl,  VDVerl,  VDlDl,  VD2D1 

MeanN 

128 

VDHorl,  VDVerl,  VDlDl,  VD2D1 

MeanX 

129 

VDHorl,  VDVerl,  VDlDl,  VD2D1 

MeanR 

130 

VDHorl 

ConHor 
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Feature  Number 

Feature  Histogram (s) 

Feature  Name 

131 

VDVerl 

ConVer 

132 

VDlDl 

ConlD 

133 

VD2D1 

Con2D 

134 

VDHorl , 

VDVerl,  VDlDl, 

VD2D1 

ConM 

135 

VDHorl , 

VDVerl,  VDlDl, 

VD2D1 

ConS 

136 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

ConN 

137 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

ConX 

138 

VDHorl , 

VDVerl,  VDlDl, 

VD2D1 

ConR 

139 

VDHorl 

ASMHor 

140 

VDVerl 

ASMVer 

141 

VDlDl 

ASM1D 

142 

VD2D1 

ASM2D 

143 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

ASMM 

144 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

ASMS 

145 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

ASMN 

146 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

ASMX 

147 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

ASMR 

148 

VDHorl 

EntHor 

149 

VDVerl 

EntVer 

150 

* 

VDlDl 

EntlD 

151 

VD2D1 

Ent2D 

152 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

EntM 

153 

VDHorl , 

VDVerl,  VDlDl, 

VD2D1 

EntS 

154 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

EntN 

155 

VDHorl, 

VDVerl,  VDlDl, 

VD2D1 

EntX 

156 

VDHorl , 

VDVerl,  VDlDl, 

VD2D1 

EntR 

157 

VDHor2 

MeanHor2 

158 

VDVer2 

MeanVer2 

159 

VD1D2 

MeanlD2 

160 

VD2D2 

Mean2D2 

I '■ 
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Feature  Number 

Feature  Histogram(s) 

Feature  Name 

161 

VDHor 2, 

VDVer2 , VD1D2, 

VD2D2 

MeanM2 

162 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

MeanS2 

163 

VDHor2 , 

VDVer2 , VD1D2, 

VD2D2 

MeanN2 

164 

VDHor2 , 

VDVer2 , VOID 2, 

VD2D2 

MeanX2 

165 

VDHor2 , 

VDVer2 , VD1D2, 

VD2D2 

MeanR2 

166 

VDHor 2 

ConHor2 

167 

VDVer2 

ConVer2 

168 

VD1D2 

ConlD2 

169 

VD2D2 

Con2D2 

170 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

ConM2 

171 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

ConS2 

172 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

ConN2 

173 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

ConX2 

174 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

ConR2 

175 

VDHor 2 

ASMHor2 

176 

VDVer2 

ASMVer2 

177 

VD1D2 

ASM1D2 

178 

VD2D2 

ASM2D2 

179 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

ASMM2 

180 

VDHor2 , 

VDVer2 , VD1D2, 

VD2D2 

ASMS  2 

181 

VDHor2 , 

VDVer 2 , VD1D2 , 

VD2D2 

ASMN2 

182 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

ASMX2 

183 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

ASMR2 

184 

VDHor 2 

EntHor2 

185 

VDVer 2 

EntVer2 

186 

VD1D2 

EntlD2 

187 

VD2D2 

Ent2D2 

188 

VDHor2 , 

VDVer 2 , VD1D2 , 

VD2D2 

EntM2 

189 

VDHor2 , 

VDVer2 , VD1D2 , 

VD*D2 

EntS2 

190 

VDHor2 f 

VDVer 2,  VD1D2 , 

VD2D2 

EntN2 
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Feature  Histogram (s) 

Feature  Name 

191 

VDHor2 , 

VDVer2 , VD1D2, 

VD2D2 

EntX2 

192 

VDHor2 , 

VDVer2 , VD1D2 , 

VD2D2 

EntR2 

193 

VDHor4 

MeanHor4 

194 

VDVer4 

MeanVer4 

195 

VD1D4 

MeanlD4 

196 

VD2D4 

Mean2D4 

197 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

MeanM4 

198 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

MeanS4 

199 

VDHor4 , 

VDVer4,  VD1D4 , 

VD2D4 

MeanN4 

200 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

MeanX4 

201 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

MeanR4 

202 

VDHor4 

ConHor4 

203 

VDVer4 

ConVer4 

204 

VD1D4 

ConlD4 

205 

VD2D4 

Con2D4 

206 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

ConM4 

207 

VDHor4 , 

VDVer4 , VD1D4, 

VD2D4 

ConS4 

208 

VDHor4 , 

VDVer4 , VD1D4, 

VD2D4 

ConN4 

209 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

ConX4 

210 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

ConR4 

211 

VDHor4 

ASMHor4 

212 

VDVer4 

ASMVer4 

213 

VD1D4 

ASM1D4 

214 

VD2D4 

ASM2D4 

215 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

ASMM4 

216 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

ASMS  4 

217 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

ASMN4 

218 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

ASMX4 

219 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

ASMR4 

220 

VDHor4 

EntHor4 
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Feature  Number 

Feature  Histogram (s) 

Feature  Name 

221 

VDVer4 

EntVer4 

222 

VD1D4 

EntlD4 

223 

VD2D4 

Ent2D4 

224 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

EntM4 

225 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

EntS4 

226 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

EntN4 

227 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

EntX4 

228 

VDHor4 , 

VDVer4 , VD1D4 , 

VD2D4 

EntR4 

229 

VDHor8 

MeanHor8 

230 

VDVer8 

MeanVer8 

231 

VD1D8 

MeanlD8 

232 

VD2D8 

Mean2D8 

233 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

MeanM8 

234 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

MeanS8 

235 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

MeanN8 

236 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

MeanX8 

237 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

MeanR8 

238 

VDHor 8 

ConHor8 

239 

VDVer8 

ConVer8 

240 

VD1D8 

ConlD8 

241 

VD2D8 

Con2D8 

242 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ConM8 

243 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ConS8 

244 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ConN8 

245 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ConX8 

246 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ConR8 

247 

VDHor8 

ASMHor8 

248 

VDVer8 

ASMVer8 

249 

VD1D8 

ASM1D8 

250 

VD2D8 

ASM2D8 
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Feature  Number  Feature  Histogram (s)  Feature  Name 


251 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ASMM8 

252 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ASMS  8 

253 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ASMN8 

254 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ASMX8 

255 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

ASMR8 

256 

VDHor8 

EntHor8 

257 

VDVer8 

EntVer8 

258 

VD1D8 

EntlD8 

259 

VD2D8 

Ent2D8 

260 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

EntM8 

261 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

EntS8 

262 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

EntN8 

263 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

EntX8 

264 

VDHor8 , 

VDVer8 , VD1D8 , 

VD2D8 

EntR8 

Table  A. 4 


INFRARED  FEATURES 


Feature  Number 

Feature  Histogram (s) 

Feature 

Name 

301 

IT 

Mean 

302 

IT 

StDev 

303 

IT 

CFO 

304 

IT 

CF10 

305 

IT 

CF20 

306 

IT 

CF30 

307 

IT 

CF40 

308 

IT 

CF50 

309 

IT 

CF60 

310 

IT 

CF70 

311 

IT 

CF80 

312 

IT 

CF90 

313 

IT 

CF100 

314 

IT 

R0-100 

315 

IT 

R10-90 

316 

IT 

R0-50 

317 

IT 

R50-100 

318 

IT 

R20-80 

319 

IT 

R30-70 

320 

IT 

R40-60 

321 

IDHorl 

MeanHor 

322 

IDVerl 

MeanVer 

323 

ID1D1 

MeanlD 

324 

ID2D1 

Mean2D 

325 

IDHorl,  IDVerl,  ID1D1,  ID2D1 

MeanM 

326 

IDHorl,  IDVerl,  ID1D1,  ID2D1 

Means 

327 

IDHorl,  IDVerl,  ID1D1,  ID2D1 

MeanN 

328 

IDHorl,  IDVerl,  ID1D1,  ID2D1 

MeanX 

329 

IDHorl,  IDVerl,  ID1D1,  ID2D1 

MeanR 

330 

IDHorl 

ConHor 

INFRARED  FEATURES  (page  2) 


Feature  Number 

Feature  Histogram (s) 

Feature  Name 

331 

IDVerl 

ConVer 

332 

ID1D1 

ConlD 

333 

ID2D1 

Con  2D 

334 

IDHorl , 

IDVerl,  ID1D1, 

ID2D1 

ConM 

335 

IDHorl , 

IDVerl,  ID1D1, 

ID2D1 

ConS 

336 

IDHorl, 

IDVerl,  ID1D1, 

ID2D1 

ConN 

337 

IDHorl, 

IDVerl,  ID1D1, 

ID2D1 

ConX 

338 

IDHorl, 

IDVerl,  ID1D1, 

ID2D1 

ConR 

339 

IDHorl 

ASMHor 

340 

IDVerl 

ASMVer 

341 

ID1D1 

ASMlD 

342 

ID2D1 

ASM  2D 

343 

IDHorl, 

IDVerl,  IDlDl , 

ID2D1 

ASMM 

344 

IDHorl, 

IDVerl,  IDlDl, 

ID2D1 

ASMS 

345 

IDHorl, 

IDVerl,  IDlDl, 

ID2D1 

ASMN 

346 

IDHorl , 

IDVerl,  IDlDl, 

ID2D1 

ASMX 

347 

IDHorl, 

IDVerl,  IDlDl, 

ID2D1 

ASMR 

348 

IDHorl 

EntHor 

349 

IDVerl 

EntVer 

350 

IDlDl 

EntlD 

351 

ID2D1 

Ent2D 

352 

IDHorl, 

IDVerl,  IDlDl, 

ID2D1 

EntM 

353 

IDHorl, 

IDVerl,  IDlDl, 

ID2D1 

EntS 

354 

IDHorl, 

IDVerl,  IDlDl, 

ID2D1 

EntN 

355 

IDHorl, 

IDVerl,  IDlDl, 

ID2D1 

EntX 

356 

IDHorl, 

IDVerl,  IDlDl, 

ID2D1 

EntR 

357 

IDHor2 

MeanHor2 

358 

IDVer2 

MeanVer2 

359 

ID1D2 

MeanlD2 

360 

ID2D2 

Mean2D2 

INFRARED 

FEATURES 

(page  4) 

Feature  Number 

Feature  Histogram (s) 

Feature  Name 

391 

IDHor2 , 

IDVer2 , 

ID1D2 , 

ID2D2 

ENTX2 

392 

IDHor2 , 

IDVer2 , 

ID1D2, 

ID2D2 

ENTR2 

393 

IDHor4 

MeanHor4 

394 

IDVer4 

MeanVer4 

395 

I DID  4 

MeanlD4 

396 

ID2D4 

Mean2D4 

397 

IDHor4 , 

IDVer4 , 

ID1D4 , 

ID2D4 

MeanM4 

398 

IDHor4 , 

IDVer4 , 

ID1D4  , 

ID2D4 

MeanS4 

399 

IDHor4 , 

IDVer4 , 

ID1D4 , 

ID2D4 

MeanN4 

400 

IDHor4 , 

IDVer4 , 

ID1D4 , 

ID2D4 

MeanX4 

401 

IDHor4 , 

IDVer4 , 

ID1D4  , 

ID2D4 

MeanR4 

402 

IDHor4 

ConHor4 

403 

IDVer4 

ConVer4 

404 

ID1D4 

ConlD4 

405 

ID2D4 

Con2D4 

406 

IDHor4 , 

IDVer4 , 

ID1D4 , 

ID2D4 

ConM4 

407 

IDHor4 , 

IDVer4 , 

ID1D4, 

ID2D4 

ConS4 

408 

IDHor4 , 

IDVer4 , 

ID1D4, 

ID2D4 

ConN4 

409 

IDHor4 , 

IDVer4 , 

ID1D4, 

ID2D4 

ConX4 

410 

IDHor4 , 

IDVer4 , 

ID1D4 , 

ID2D4 

ConR4 

411 

IDHor4 

ASMHor4 

412 

IDVer4 

ASMVer4 

413 

ID1D4 

ASM1D4 

414 

ID2D4 

ASM2D4 

415 

IDHor4 , 

IDVer4 , 

ID1D4, 

ID2D4 

ASMM4 

416 

IDHor4 , 

IDVer4 , 

ID1D4 , 

ID2D4 

ASMS  4 

417 

IDHor4 , 

IDVer4 , 

ID1D4  , 

ID2D4 

ASMN4 

418 

IDHor4 , 

IDVer4 , 

ID1D4 , 

ID2D4 

ASMX4 

419 

IDHor4 , 

IDVer4 , 

ID1D4, 

ID2D4 

ASMR4 

420 

IDHor4 

EntHor4 

Mitt MiBi 


■ -v: 
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Feature  Number 

Feature  Histogram (s) 

Feature  Name 

421 

IDVer4 

EntVer4 

422 

ID1D4 

EntlD4 

423 

ID2D4 

Ent2D4 

424 

IDHor4 , 

IDVer4 , ID1D4 , 

ID2D4 

EntM4 

425 

IDHor 4 , 

IDVer4 , ID1D4 , 

ID2D4 

EntS4 

426 

IDHor4 , 

IDVer4 , ID1D4 , 

ID2D4 

EntN4 

427 

IDHor4 , 

IDVer4 , ID1D4, 

ID2D4 

EntX4 

428 

IDHor4 , 

IDVer4 , ID1D4, 

ID2D4 

EntR4 

429 

IDHor8 

MeanHor8 

430 

IDVer8 

MeanVerS 

431 

ID1D8 

MeanlD8 

432 

ID2D8 

Mean2D8 

433 

IDHor8 , 

IDVer8 , ID1D8, 

ID2D8 

MeanM8 

434 

IDHor8 , 

IDVer8 , ID1D8, 

ID2D8 

MeanS8 

435 

IDHor8 , 

IDVer8 , ID1D8 , 

ID2D8 

MeanN8 

436 

IDHor8 , 

IDVer8,  ID1D8, 

ID2D8 

MeanX8 

437 

IDHor8 , 

IDVer8 , ID1D8 , 

ID2D8 

MeanR8 

438 

IDHor8 

ConHor8 

439 

IDVer8 

ConVer8 

440 

ID1D8 

ConlD8 

441 

ID2D8 

Con2D8 

442 

IDHor8 , 

IDVer8 , ID1D8 , 

ID2D8 

ConM8 

443 

IDHor 8, 

IDVer8 , ID1D8 , 

ID2D8 

ConS8 

444 

IDHor8 , 

IDVer8 , ID1D8, 

ID2D8 

ConN8 

445 

IDHor8 , 

IDVer8 , ID1D8, 

ID2D8 

ConXS 

446 

IDHor8 , 

IDVerS , ID1D8, 

ID2D8 

ConR8 

447 

IDHor8 

ASMHor8 

448 

IDVer8 

ASMVer8 

449 

ID1D8 

ASM1D8 

450 

ID2D8 

ASM2D8 
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Feature  Histogram (s) 

Feature  Name 

452 

IDHor8 , IDVer8 , ID1D8,  ID2D8 

ASMS  8 

453 

IDHor8 , IDVer8 , ID1D8 , ID2D8 

ASMN8 

454 

IDHor8 , IDVer8 , ID1D8,  ID2D8 

ASMX8 

455 

IDHor8 , IDVer8 , ID1D8 , ID2D8 

ASMR8 

456 

IDHor8 

EntHor8 

457 

IDVer8 

EntVer8 

458 

ID1D8 

EntlD8 

459 

ID2D8 

Ent2D8 

460 

IDHor8 , IDVer8 , ID1D8,  ID2D8 

EntM8 

461 

IDHor8 , IDVer8 , ID1D8 , ID2D8 

EntS8 

462 

IDHor8 , IDVer8 , ID1D8,  ID2D8 

EntN8 

463 

IDHor8 , IDVer8 , ID1D8 , ID2D8 

EntX8 

464 

IDHor8 , IDVer8 , ID1D8,  ID2D8 

EntR8 

465 

TO1  TT 

(QUADRANT  1) ' (QUADRANT  2) ' 

TT  TT 

(QUADRANT  3)'  (QUADRANT  4) 

MaxR10-90 

466 

TT  TT 

(QUADRANT  1)'  (QUADRANT  2)' 

T T TT 

(QUADRANT  3)'  (QUADRANT  4) 

RanR10-90 

467 

TT  IT 

-1  (QUADRANT  1)  ' (QUADRANT  2)  ' 

IT  TT 

(QUADRANT  3)'  (QUADRANT  4) 

MinCFO 

468 

TT  TT 

(QUADRANT  1)'  (QUADRANT  2)' 

TT  TT 

1 (QUADRANT  3)'  (QUADRANT  4) 

RanCFO 

469 

TT  TT 

1 (QUADRANT  1)'  (QUADRANT  2)' 

TT  TT 

(QUADRANT  3)'  (QUADRANT  4) 

MaxStDev 

470 

IT  IT 

(QUADRANT  1)'  (QUADRANT  2)' 

TT  IT 

AA (QUADRANT  3)'  (QUADRANT  4) 

RanStDev 

Maximum  Likelihood  Classifier 


The  maximum  likelihood  classifier  for  a k-class  problem 
(classes  w^,...,w.)  assigns  a sample  observation  with  feature 
vector  X to  class  wi  iff 

(1)  In  P(Wj /X)  > In  P(w^/X)  for  all  i ^ j,  i=l,...,k 
where  P(w./X)  is  the  a posteriori  probability  that  the 

J 

observation  with  feature  vector  X belongs  to  class  w^  Accord- 

ing  to  Bayes'  Rule,  the  a posteriori  probability  P ( w - / X ) is 

J 

related  to  the  conditional  probability  p(X/w.)  as  follows: 

J 

, % - P ( X/w . ) P ( w . ) 

(2)  P(Wj/X)  = i * 

l p(X/w.)P(w.) 
i = 1 1 1 

where  the  a priori  probability  P(w-)  is  given  by 

J 

(5)  P(wj)  = 4s- 

l ", 

i = 1 1 

where  ni  = number  of  samples  in  class  wi  , i = l,...,k. 

*► 

Since  the  denominator  in  the  expression  for  P(w./X)  is 

J 

common  to  all  classes,  the  decision  rule  can  be  rephrased 


Assign  an  observation  with  feature  vector  X to  class  w.  iff 

J 

(4)  In  p ( X/w j ) P ( w j ) > In  p(X/w^)P(w^)  for  all  i f j, 

i = 1 .....  k 

Assuming  that  the  class  conditional  densities  can  be  modeled 
by  multivariate  normal  densities, 


(5)  p ( X / w i ) = H/2  l 1/2  exp[  1(X-M  )TL  ( X-M  ) ] 

J (2n)d/z|L  | 1/2  2 J J J 

J 

where  d is  the  number  of  features  selected  at  the  decision  node 

y.  is  the  dxd  covariance  matrix  for  the  class  w. 

J 

■f 

X is  the  d-component  column  vector  of  feature  values 

■f 

Mj  is  the  d-component  column  vector  of  feature  means 
for  the  class  w i . 

J 

If  Xrc  is  the  rth  component  of  the  feature  vector  X for 

sample  c of  class  w^  where  c = 1 , . . . , n • and  mr  is  the  rth 

component  of  the  mean  vector  M.  for  class  w4  and  o = o 

j J rs  sr 


is  the  r-sth  component  of  then 

J 


(6)  m = C = 1 

r n, 

J 


(7)  o. 


cHXrC-mr)(Xsc-ms) 


The  next  three  classifiers  --  multi  class  one-against-the- 
rest,  multiclass  voting,  and  Fisher  (for  two-class  problem) 
with  sample  P(w.j)  classify  a simple  observation  with  feature 
vector  X into  class  w.  depending  on  the  result  of  one  or  more 

J 

two-class  comparisons  of  the  form 
(8)  In  P ( w j / X ) > In  P(w-/X) 

where  for  each  two-class  comparison,  the  covariance  matrix 
of  both  classes  is  assumed  to  be  equal  and  is  estimated  by  an 
averaged  covariance  matrix. 


II.  Multiclass  One-Agai nst-The-Rest 

The  multiclass  one-agai ns t-the-rest  classifier  for  a 
k-class  problem  assigns  a sample  observation  with  feature 
vector  X to  class  w.  iff  one  and  only  one  of  the  following 

J 

inequalities  is  satisfied: 

In  p(X/wj)  > In  p(X/not  w-j ) 

(9)  ; 

1 n p(X/wk)  > 1 n p ( X/not  wk) 


Otherwise  the  sample  is  rejected  and  no  classification  de- 
cision is  made.  Sample  a priori  probabilities  P(wi)  and 
P(not  w^  are  assumed  to  be  equal  for  all  i = l,...,k  and  thus 
have  been  dropped  from  equation  (4)  to  obtain  the  above  in- 
equalities.  p(X/w^)  is  assumed  to  be  distributed  normally 
with  mean  vector  and  covariance  matrix  ^ and  p(X/not  w^) 
is  assumed  to  be  distributed  normally  with  mean  vector 
and  the  same  covariance  matrix  ^ used  to  characterize 
p(X/w.j).  If  the  components  of  the  mean  vector  M • are  given 
by  equation  (6)  and  the  components  of  the  covariance  matrix 
for  j=l,...,k  are  given  by  equation  (7),  then  the  mean 
vector  M.j  and  the  covariance  matrix  ^ are  given  by 


(10)  M.  = 


M.j  + • • • +M  . -j  +M  i + ^ + • • • +M  k 

— 


where  k is  the  number  of  classes  and 

~ i r y1+...+y.  ,+y. ,.+•••+ 
nil  - 1 h * 1 

L t-i 


Ul 


k- 1 


J 


III.  Multiclass  Voting 


The  multiclass  voting  classifier  for  a k-class  problem 

k 1 

determines,  for  each  of  the  ^ ^ ) \2\  two-class  combinations 
of  k classes,  which  one  of  the  two  classes  satisfies  the  in- 
equal i ty 

(12)  In  p ( X/Wj ) > In  pfX/Wj),  i f j 

The  multiclass  voting  classifier  then  assigns  a sample  cloud 

-► 

observation  with  feature  vector  X to  that  class  wv  such  that 
the  number  of  two-class  inequalities  of  the  form  (12)  satis- 
fied by  wv  is  greater  than  the  number  satisfied  by  any  other 
class  w^ , i f v.  If  two  classes  are  tied  for  the  greatest 
number  of  votes,  the  sample  is  rejected.  Sample  a priori 
probabilities  P(w-)  and  P(wi)  are  assumed  to  be  equal  for  all 

i,  j = l,...,k.  p(X/w.)  is  assumed  to  be  distributed  normally 

- ► 

with  mean  vector  M.  and  covariance  matrix  V . . and  p(X/w.)  is 

■ I J J 

assumed  to  be  distributed  normally  with  mean  vector  M.  and 

J 

covariance  matrix  the  averaged  covariance  of  and  . 

Components  of  the  mean  vector  M.  for  j = l,...,k  are  given 

J 

by  (6)  and  components  of  the  covariance  matrix  for  j = 1 , 

J 


...,k  are  given  by  (7).  The  covariance  matrix  is  then 

' J 

given  by 


IV. 


IV.  Fisher  Two-Class  Classifier  with  Sample  P(w.) 

The  Fisher  classifier  with  sample  P(w^)  for  a two- 
class  problem  (w^ , W£)  assigns  a sample  observation  with 


feature  vector  X to  class  w^  iff 


2. 


(14)  In  p(X/wi  )P(w1- ) > In  p(X/Wj)P(wj)  for  i / j , i = 1 

A priori  probabilities  are  computed  from  sample  sizes  by 
equation  (3).  p(X/w^)  and  p(X/w»)  are  assumed  to  be  normally 

distributed  with  mean  vectors  Mi  and  M2  respectively  and  with 
averaged  covariance  matrix  ^ 2 w^ere  the  components  of  Mj  , 
j = 1,  2 are  given  by  equation  (6),  and  if  the  components  of 
for  j = 1 ,2  are  given  by  equation  (7),  then  the  covariance 
matrix  ^ ^ defined  as 


(15)  I]  2 = p(wl  ) ly  + p(w2^2 


1 


V.  Fisher  Distance  Feature  Selection  Criterion 

The  Fisher  Distance  feature  selection  criterion  for 
single  combinations  of  features  is  defined  as 


(16)  J 


I m i ~ | 

O-j  + 02 


where  m-|  , mg  are  the  means  of  the  selected  feature  for 
classes  w-j  and  Wg,  respectively,  and  , Og  are  the  standard 
deviations  for  classes  w-|  and  Wg.  The  Fisher  Distance 
measures  the  separation  between  two  classes  for  the  given 


feature . 


Appendix  C 


Confusion  Matrices 


Note : In  the  confusion  matrices  for  Table  36,  user  cate- 

gories "MIX1"  and  "MIX2",  the  row  headings  for  the  confusion 
matrices,  represent  respectively  the  training  set  of  labelled 
mixed  samples  used  to  train  the  classifier  at  level  2 and  the 
identical  training  set  of  labelled  mixed  samples  used  to 
train  the  classifier  at  level  1 of  Tree  2 (Figure  6 ).  Both 
rows  2 and  3,  therefore,  repeat  the  same  information  concern- 
ing the  automatic  classification  categories  (column  headings) 
into  which  the  mixed  samples  were  classified.  The  column 
heading  "MIX1"  denotes  the  terminal  automatic  classification 
bin  at  level  2 for  mixed  samples  and  the  column  heading  "MIX2" 
denotes  the  terminal  automatic  classification  bin  at  level  1 
for  mixed  samples.  In  experiment  1 (Table  36)  at  level  1, 

12  mixed  samples  were  incorrectly  classified  as  low.  From 
! these  12  samples,  the  entry  under  the  column  heading  "MIXl" 

shows  that  6 samples  were  classified  at  level  2 into  "MIXl", 
making  the  total  number  of  correctly  classified  mixed 
samples  73  + 6 = 79. 
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